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FIELD OF THE INVENTION 

5 The present invention relates generally to a novel human gene and its derivatives and to 
mammalian, animal, insect, nematodes, avian and microbial homologues thereof. The present 
invention further provides pharmaceutical compositions and diagnostic agents as well as genetic 
molecules useful in gene replacement therapy and recombinant molecules useful in protein 
replacement therapy. 

10 

BACKGROUND OF THE INVENTION 

Bibliographic details of the publications referred to by author in this specification are collected 
at the end of the description. 

15 

The increasing sophistication of recombinant DNA technology is greatly facilitating research and 
development in the medical and allied health fields. There is growing need to develop 
recombinant and genetic molecules for use in diagnosis and in conventional pharmaceutical 
preparations as well as in gene and protein replacement therapies. 



20 



In work leading up to the present invention, the inventors sought to identify and clone human 
genes which might be useful as potential diagnostic and/or therapeutic agents. Molecules of 
particular interest targeted by the inventors were gene regulators including regulatory proteins, 
signal transducers and heat shock proteins. 



25 



Gene expression generally requires interaction between a regulatory protein and an appropriate 
recognition sequence of a target gene. Regulatory proteins comprise in many cases a domain or 
motif which facilitates binding to DNA. One particular motif comprises small sequence units 
repeated in tandem with each unit folded about a zinc atom to form separate structural domains 
30 This motif is now referred to as a zinc finger domain. Such a domain is generally defined by the 
number of cysteine (C) and histidine (H) residues. 
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In addition, knowledge of cellular interaction in the control of cell proliferation is essential in the 
rational design of specific therapeutic strategies aimed at controlling proliferative disorders. 
Such proliferative disorders including a range of cancers, inflammatory conditions and 
atherosclerosis. An important aspect of cellular interaction is in signal transduction via receptors 
5 to intracellular transducers. One key signal transducer is Ras which couples the receptors for 
diverse extracellular signals to different effectors. Ras directly activates the downstream kinase 
Raf which in turn induces the mitogen activated protein kinase (MAPK) cascade. 

Another regulatory mechanism involves heat shock proteins. The Escherichia coli heat shock 
10 protein, DnaJ, is the founding member of a family of proteins which are associated with protein 
folding, protein complex assembly and transit through subcellular components. 

Prokaryotic and eukaryotic DnaJ homologues have a modular organisation consisting of a J 
domain, a glycine-rich spacer, CXXCXGXG [SEQ ID NO:l] repeats and a C-terminal region 
15 with no obvious sequence features, as well as additional sequences for protein targeting. The 
J domain is anticipated to mediate interaction with heat shock 70 proteins (Hsp70) and consists 
of some 70 amino acids, frequently located at the N-terminus of the protein. 

In accordance with the present invention, a genes have been identified from the human genome 
20 which encodes proteins having a regulatory role. One gene, in accordance with the present 
invention encodes a protein with an N-terminal region resembling a zinc-finger domain of a novel 
type. Another gene encodes a protein involved in guanine nucleotide exchange factor (GEF) 
signalling pathways. Yet another gene encodes a protein which is a heat shock protein or heat 
shock-like protein which may have a role in tumour suppression. 

25 

SUMMARY OF THE INVENTION 

Throughout this specification, unless the context requires otherwise, the word "comprise", or 
variations such as "comprises" or "comprising", will be understood to imply the inclusion of a 
30 stated element or integer or group of elements or integers but not the exclusion of any other 
element or integer or group of elements or integers. 
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Sequence identity numbers (SEQ ID NOs.) for nucleotide and amino acid sequences referred to 
in the subject specification are defined after the bibliography. A summary of SEQ ID NOs. is 
also given in Table 1. 

5 One aspect of the present invention contemplates an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding an amino acid 
sequence having homology to a regulator of gene expression or a derivative of said gene 
regulator. 

10 Another aspect of the present invention provides an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding a regulator of 
gene expression wherein said regulator comprises a zinc finger domain of an (HC,) 2 type. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
15 comprising a sequence of nucleotides or a complementary form thereof selected from: 

0) a nucleotide sequence set forth in SEQ ID NO:2; 

a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 
a nucleotide sequence having at least about 40% similarity to the nucleotide sequence' 
20 of(i)or(ii);and 

a nucleotide sequence capable of hybridizing under low stringency conditions at 42«C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 



(ii) 
(iii) 



(iv) 



The nucleotide sequence set forth in SEQ ID NO:2 defines the gene. mc g 4. This gene encodes 
25 a product, MCG4, having an amino acid sequence set forth in SEQ ID NO:3. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and a* animal, more particularly a mammalian and even more particularly a human mc g 4 
gene portion, which mc g 4 gene portion is capable of encoding an MCG4 polypeptide or a 
30 functional or immunologically interactive derivative thereof. 
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Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in mcg4 t said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg4 wherein the presence of such a nucleotide substitution, deletion 
5 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcg4, said method comprising screening for a single or 
10 multiple amino acid substitution, deletion and/or addition to MCG4 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG4 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
15 sample with an antibody specific for MCG4 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG4 complex to form, and then detecting said complex. 

A further aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
20 amino acid sequence having homology to a guanine nucleotide exchange factor (GEF) or a 
derivative thereof. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

25 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ED NO: 5 
or 7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
30 of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions to the 
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nucleotide sequence set forth in (i), (ii) or (iii). 



5 



The nucleotide sequence set forth in SEQ ID NO:4 or 6 defines the gene, mcg7. This gene 
encodes a product, MCG7, having an amino acid sequence set forth in SEQ ID NO:5 or 7. 



Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mc g 7 
gene portion, which mc g 7 gene portion is capable of encoding an MCG7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

10 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in mc g 7, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mc g 7 wherein the presence of such a nucleotide substitution, deletion 
15 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspeet of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mc g 7, said method comprising screening for a single or 
20 multiple amino acid substitution, deletion and/or addition to MCG7 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG7 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
25 sample with an antibody specific for MCG7 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG7 complex to form, and then detecting said complex. 

Yet another aspect of the present invention contemplates an isolate nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
30 amino acid sequence having homology to a heat shock protein or a heat shock binding protein 
or a derivative thereof. 
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Another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

5 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 4 1°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

10 

The nucleotide sequence set forth in SEQ ID NO:8 defines the gene, mcgl8. This gene encodes 
a product, MCG18, having an amino acid sequence set forth in SEQ ID NO:7. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
15 portion and an animal, more particularly a mammalian and even more particularly a human 
meg 18 gene portion, which mcgl8 gene portion is capable of encoding an MCG18 polypeptide 
or a functional or immunologically interactive derivative thereof. 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
20 caused or facilitated by an aberration in mcgl8, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcgl8 wherein the presence of such a nucleotide substitution, deletion 
and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

25 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcgl8 t said method comprising screening for a single 
or multiple amino acid substitution, deletion and/or addition to MCG18 wherein the presence of 
such a mutation is indicative of or a propensity to develop said condition. 

30 

Another aspect of the present invention contemplates a method for detecting MCG18 or a 
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derivative thereof in a biological sample said method comprising contacting said biological 
sample with an antibody specific for MCG18 or its derivatives or homologues for a time and 
under conditions sufficient for an antibody-MCGIS complex to form, and then detecting said 
complex. 

A summary of SEQ ID Nos. referred to in the subject specification is shown in Table 1 . 
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TABLE 1 
SUMMARY OF SEQ ID Nos. 



SEQ ID NO. DESCRIPTION 



1 amino acid repeat sequence in DnaJ homologues 

2 Nucleotide sequence of mcg4 

3 amino acid sequence of MCG4 

4 nucleotide sequence of mcg7 
10 5 amino acid sequence of MCG7 

6 nucleotide sequence of meg 7 within exon of 
nucleotides 183-288 

7 amino acid sequence of MCG7 within exon of 
nucleotide 183-288 

8 nucleotide sequence of mcgl8 

9 amino acid sequence of MCG 1 8 

15 10-18 amino acid sequence identified using BESTFTT 

19 sequence of pGEX and meg 7 junction 

20 sequence of pGEX and mcg7 junction 

2 1 nucleotide sequence of myc-tag/mcg7 junction 

22 amino acid sequence corresponding to SEQ ID NO:2 1 
20 23 nucleotide sequence of pGEX and meg 7 junction 

24 amino acid sequence corresponding to SEQ ID NO:23 
25-36 mcg7-specific oligonucleotide 
37-45 mcg78-specific oligonucleotide 



25 Single and three letter abbreviations for amino acid residues are shown in Table 2. 
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TABLE 2 



Amino Acid Three-letter One-lette 

Abbreviation Symbol 

5 



Alanine 


Ala 


A 


Arginine 


Arg 


R 


Asparagine 


Asn 


N 


Aspartic acid 


Asp 


D 


10 Cysteine 


Cys 


C 


Glutamine 


Gin 


Q 


Glutamic acid 


Glu 


E 


Glycine 


Gly 


G 


Histidine 


His 


H 


IS Isoleucine 


De 


I 


Leucine 


Leu 


L 


Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


20 Proline 


Pro 


P 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


25 Valine 


Val 


V 


Any residue 


Xaa 


X 



30 
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BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 is a representation of the nucleotide sequence [SEQ ID NO:2] and corresponding 
amino acid sequence [SEQ ID NO:31 of mcg4. 

5 

Figure 2 is a representation of the alignment of the human MCG4 amino acid sequence with a 
translation of a partial murine expressed sequence tag (EST). 

Figure 3 is a representation of the alignment of the human MCG4 amino acid sequence with a 
10 translation of a partial nematode EST. 

Figure 4 is a diagrammatic representation showing a predicted structure of MCG4 where H and 
C represent histidine and cysteine residues, respectively and X refers to any amino acid residue. 
Zn represent zinc atoms. 

15 

Figure 5 is a representation of sensitive sequence homology search of related cysteine-containing 
motifs in another Caenorhabditis elegans protein. 

Figure 6 is a representation showing that a related cysteine containing motif is present in the 
20 GATA-binding transcription factor from Saccharomyces pombe. 

Figure 7 is a Northern blot showing expression of mcg4 in various cultured human cancer cell 
lines. Lanes 1-5, respectively, represent the hybridization signal from 15//g total RNA derived 
from various human cancer cell lines. Lanes 1-5, respectively, contain RNA from H69 lung 
25 carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, HaCat transformed 
keratinocytes, T24 bladder carcinoma cells. 

Figure 8 is a representation of a partial alignment of mcg4 with human ESTs AA074703 and 
AA134788. 

30 

Figure 9 is a representation of the partial nucleotide sequence alignment between a human 



WO 98/53061 



- 11 - 



PCT/AU98/00380 



(W32939) and mouse (AA242159) mc^-like EST in the putative 5' UTR of the mcg 4 cDNA. 
The putative initiation codon is underlined and the region upstream represents 5' UTR. 

figure 10 is a representation showing MacVector alignment of MCG4 with forward translations 
5 ofESTsAA134788andAA074703. The nucleotide sequences are shown in Figure 8. 

figure 11 is a diagrammatic representation of the domains of MCG4 

zinc finger consensus: CX 2 HX 4 CX 2 CX 4 HX 2 CX 17 CX 2 CX 18 HX 2 CX Ig CX 2 C 
acidic domain consensus: 9/34 amino acids negatively charged, 0/34 positively charged 
basic domain consensus: 13/55 amino acids positively charged, 0/55 negatively charged 
leucine zipper domain consensus: UQIJQRXjUgL 

alternate "novel" leucine zipper-like motif where leucine would not be aligned along the 
surface of an alpha helix domain: (aa261) L^LXLX^XJL (aa 286). 



10 



one 



15 Figure 12 is a representation showing similarity of MCG7 with GEFs of 



various organisms. 



figure 13(a) is a representation of the nucleotide sequence [SEQ ID NO:4] and corresponding 

aminoacidsequence[SEQIDNO:5]ofmcg7. Nucleotides 183-288 are an alternative spliced 
exon (shown in lower case). 

20 

Figure 13(b) is a representation of the partial nucleotide sequence [SEQ ID NO:6] and 
conesponding amino acid sequence [SEQ ID NO:7] of mc g 7 but without the exon shown in Fig. 
13(a). Amino acids have been numbered from the first methionine codon (underlined) The 
cDNA molecules of Fig. 1 3(a) and Fig. 1 3(b) differ by the inclusion and exclusion of the exon 
25 of nucleotides 183-288. 



30 



Figure 14 is a representation showing a comparison between MCG7 and a homologue from 
Caenorhabditis elegans using the BESTFIT algorithm, in the figure, the following sequences 



are underlined: 
EF-Hand= PROSITE DATABASE NO. PD0C00018 
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1 a nematode DVDEEDEVEDffiF [SEQ ID NO: 10] 

lb human DVDGDGHISQEEF [SEQ ID NO: 1 1 ] 

nematode DHDRDGFISQEEF [SEQ ID NO: 12] 

lc human DQNQDGCISREEM [SEQ ID NO:13] 

5 nematode DVDMDGQISKDEL [SEQ ID NO: 14] 



GUANINE NT BINDING REGION = BLOCKS DATABASE NO. BL00720B 

2 human HFVHVAEKIXQLQNFNTIJv4AWGGLSHSSISRLKETH[SEQIDNO:15] 
nematode KFVHVAKHLRKINNFNTLMSVVGGITHSSVARLAKTY 

10 [SEQIDNO:16] 

DaG-PE BINDING DOMAIN = PROSITE DATABASE NO. PD0C00379 

3 human HNFQESNSLRPVACRHCKALILGIYKQGLKCRACGVNCHKQCKDRLSVEC 

[SEQ ID NO: 17] 

15 nematode HNFHFXITLTPTTCNHCWKLLWGILRQGFKCKDCGLAVHSCCKSNAVAEC 
[SEQ ID NO: 18] 

Figure 15 is a representation of an alignment of human and a partial (5' UTR and partial coding 
sequence) murine mcgl cDNA (GenBank Acc. No. W71787 and AA237373). The putative 
20 initiation codon is underlined. The murine sequence represents a composite of 2 partial cDNA 
sequences from the EST database (accession numbers W71787 and AA237373). Nucleotide 
differences between human and murine sequences are shown in lower case lettering and identical 
residues are indicated with asterisks. 

25 Figure 16 is a representation of further 5* nucleotide and corresponding amino acid sequence for 
human mcgl. Nucleotide positions 1-321 were derived from GenBank Acc. No. AC000134 and 
nucleotides 322 onwards from Fig. 13(a). Two in-frame initiation codons are underlined. 
Asterisks denote in-frame stop codons. 

30 Figure 17 is a graphical representation of a GDP release assay. □ Experiment #1 (mean of 
duplicates). 0 Experiment #2 (mean of duplicates). The exchange reaction contained 36pmols 
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of GST-MCG (N-tenninally truncated; encoded by Construct B in Fig. 18) and 1.6-12.8 pmols 
of recombinant GST-N-Ras.GDP. Reaction time 6 mins. 
Estimated reaction constants: 

= 2 ' 1 MM, V^, = 37pMoI/6min/36pMol [Expt#l] 
5 K ™ = 15 MM, = 30.3pMoI/6 min/36pMol [Expt#2] 



Figure 18 depicts various recombinant plasmids containing partial or full-length 



meg 7. 



Figure 19 is a representation of the nucleotide sequence [SEQ ID NO:8] and corresponding 
10 ammo acid sequence [SEQ ID NO:9] of meg] 8. 



Figure 20 is a representation showing that MCG18 has partial homology to E. 



coli DnaJ. 



Figure 21 is a representation showing that MCG18 has homology to two Caenorhabitis elegans 
15 proteins. 



F1gure22 is a representation showing that MCG18 has homology to * Saccharomyces pombe 
protein. 



20 Figure 23 is a representation showing homology of MCG18 to aDrosophila virilis protein. 

Figure 24 is a representation showing homology of MCG18 to human DnaJ proteins HDJ 
2/HSDJ, HDJ- 1/HSP40 and HSJ1 . 



25 F ^ 25 isarepresentationofthe nucleotide and corre^ 
mcg!8. 



Figure 26 is a representation of homology between human and 



murine MCG 18. 



30 F Igure 27 depic* nnc.eo.ide sequences responding ,„ ,he 5' un^aed region of hulmn 
mcg!8. 
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Figure 28 depicts a Northern blot showing expression of mcgl8 transcripts in total RNA isolated 
from various human cancer cell lines grown in culture. Lanes 1-5 respectively contain 15/zg 
RNA from H69 lung carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, 
HaCat transformed keratinocytes, T24 bladder carcinoma cells. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

t 

The present invention provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
5 homology to a regulator of gene expression or a derivative of said gene regulator. 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
regulator of gene expression wherein said regulator comprises a zinc finger domain of an (HC 3 ) 2 
10 type. 

Still more particularly, the present invention provides an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

15 (i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
20 to the nucleotide sequence set forth in (i), (ii) or (iii). 

The present invention also provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
homology to a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

25 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 



30 



(i) 
(ii) 



a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 
or 7; 
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(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and j 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

5 

Another aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
amino acid sequence having homology to a heat shock protein or a heat shock-binding protein 
or a derivative thereof. 

10 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ED NO:8; 

15 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

20 

Preferably, the percentage similarity is at least about 50%. More preferably, the percentage 
similarity is at least about 60%. 

Reference herein to a low stringency at 42 °C includes and encompasses from at least about 1% 
25 v/v to at least about 15% v/v formamide and from at least about 1M to at least about 2M salt for 
hybridisation, and at least about 1M to at least about 2M salt for washing conditions. Alternative 
stringency conditions may be applied where necessary, such as medium stringency, which 
includes and encompasses from at least about 16% v/v to at least about 30% v/v formamide and 
from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M 
30 to at least about 0.9M salt for washing conditions, or high stringency, which includes and 
encompasses from at least about 31% v/v to at least about 50% v/v formamide and from at least 



WO 98/53061 



PCT/AU98/00380 



- 17- 

about 0.01M to at least about 0. 15M salt for hybridisation, and at least about 0.01M to at least 
about 0. 1 5M salt for washing conditions. 1 

The term "similarity" as used herein includes exact identity between compared sequences at the 
5 nucleotide or amino acid level. Where there is non-identity at the nucleotide level, "similarity" 
includes differences between sequences which result in different amino acids that are nevertheless 
related to each other at the structural, functional, biochemical and/or conformational levels. 
Where there is non-identity at the amino acid level, "similarity" includes amino acids that are 
nevertheless related to each otter at the structural, functional, biochemical and/or conformational 
10 levels. 

The present invention extends to nucleic acid molecules with percentage similarities of 
approximately 65%, 70%, 75%, 80%, 85%, 90% or 95% or above or a percentage in between. 

15 The nucleic acid molecule of the present invention defined by SEQ ID NO:2 is hereinafter 
referred to as constituting the "mcg4" gene. The protein encoded by mcg4 is referred to herein 
as "MCG4"and has an amino acid sequence set forth in SEQ ID NO:3. The mcg4 gene is 
proposed to encode, in accordance with the present invention, a regulator of gene expression and 
comprises a novel zinc finger domain, (HC 3 ) 2 . A regulator of gene expression includes a 

20 transcription factor. Regulation may be at the level of nucleic acidrprotein or proteinrprotein 
interaction. 

The nucleic acid molecule of the present invention defined by SEQ ID NO:4 or 6 is hereinafter 
referred to as constituting the "mcgT gene. The protein encoded by mcg7 is referred to herein 
25 as "MCG7" and has an amino acid sequence set forth in SEQ ID NO:5 or 7 and is involved in 
signal transduction. The difference in the nucleotide and amino acid sequence is due to the 
presence or absence of an exon at nucleotides 183-288. 

The nucleic acid molecule of the present invention defined by SEQ ID NO:8 is hereinafter 
30 referred to as constituting the "mcgl8" gene. The protein encoded by mcgl8 is referred to 
herein as "MCG18" and comprises the amino acid set forth in SEQ ID NO:9. 
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The present invention extends to the naturally occurring genomic mcg4 t mcg7 and mcgl8 
nucleotide sequences or corresponding cDN A sequences or to derivatives thereof. Derivatives 
contemplated in the present invention include fragments, parts, portions, mutants, homologues 
and analogues of MCG4, MCG7 or MCG8 or the corresponding genetic sequences. Derivatives 

5 also include single or multiple amino acid substitutions, deletions and/or additions to MCG4, 
MCG7 or MCG18 or single or multiple nucleotide substitutions, deletions and/or additions to 
mcg4 t mcgl or mcgl8. "Additions" to the amino acid or nucleotide sequences include fusions 
with other peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference 
herein to "MCG4" or "mcg4", "MCG7" or "mcg7" or "MCG8" or mcgl 8" includes reference to 

10 all derivatives thereof including functional derivatives and immunologically interactive derivatives 
of MCG4, MCG7 or MCG18. 

The mcg4 t mcg7 and mcgl8 of the present invention are particularly exemplified herein from 
humans and in particular from human chromosome 1 lql3: 

15 

The present invention extends, however, to a range of homologues from, for example, primates, 
livestock animals (eg. sheep, cows, horses, donkeys, pigs), companion animals (eg. dogs, cats) 
laboratory test animals (eg. rabbits, mice, rats, guinea pigs), reptiles, birds (eg. chickens, ducks, 
geese, parrots), insects, nematodes, eukaryotic microorganisms and captive wild animals (eg. 
20 deer, foxes, kangaroos). Reference herein to mcg4 and mcgl 8 or their respective proteins 
MCG4, MCG7 and MCG18 includes reference to these molecules of human origin as well as 
novel forms of non-human origin. 

The nucleic acid molecules of the present invention may be DNA or RN A. When the nucleic 
25 acid molecule is in DNA form, it may be genomic DNA or cDNA. RNA forms of the nucleic 
acid molecules of the present invention are generally mRNA. 

Although the nucleic acid molecules of the present invention are generally in isolated form, they 
may be integrated into or ligated to or otherwise fused or associated with other genetic 
30 molecules such as vector molecules and in particular expression vector molecules. Vectors and 
expression vectors are generally capable of replication and, if applicable, expression in one or 
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both of a prokaryotic cell or a eukaryotic cell. Preferably, prokaryotic cells include E. coli, 
Bacillus sp and Pseudomonas sp. Preferred eukaryotic cells include yeast, fungal, mammalian 
and insect cells. 

5 Accordingly, another aspect of the present invention contemplates a genetic construct comprising 
a vector portion and an animal, more particularly a mammalian and even more particularly a 
human mcg4 gene portion, which mcg4 gene portion is capable of encoding an MCG4 
polypeptide or a functional or immunologically interactive derivative thereof. 

10 Preferably, the mcg4 gene portion of the genetic construct is operably linked to a promoter in 
the vector such that said promoter is capable of directing expression of said mcg4 gene portion 
in an appropriate cell. 

In addition, the mcg4 gene portion of the genetic construct may comprise all or part of the gene 
15 fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

20 

It is proposed in accordance with the present invention that MCG4 is a transcription factor 
involved in gene regulation. Mutations in mcg4 may result in aberrations in gene regulation 
leading to the development of or a propensity to develop various types of cancer. In this regard, 
although not wishing to limit the present invention to any one hypothesis or mode of action, it 
25 is proposed that mcg4 or its expression product may be involved in the tissue-specific or 
temporal regulation of particular genes. 

A deletion or aberration in the mcg4 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
30 heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
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be determined by assaying for aberrations in the parents and/or proband of a subject under 
investigation. , 

According to this aspect of the present invention, there is contemplated a method of detecting 
5 a condition caused or facilitated by an aberration in mcg4, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcg4 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

10 

Another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg7 
gene portion, which mcg7 gene portion is capable of encoding an mcg7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

15 

Preferably, the mcg7 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said mcg7 gene portion 
in an appropriate cell. 

20 In addition, the mcg7 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
25 comprising same. 

It is proposed in accordance with the present invention that MCG7 is a GEF involved in signal 
transduction. Mutations in mcg7 or MGG7 may result in defective control of cell proliferation 
leading to the development of or a propensity to develop various types of cancer. 

30 

A deletion or aberration in the mcg7 gene may also be important in the detection of cancer or 
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a propensity to develop cancer. An aberration may be a homozygous mutation or a 
heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents of a subject under investigation. 

5 

According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in meg 7, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcg7 wherein the presence of such a nucleotide 
10 substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

Yet another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human 
15 mcg!8 gene portion, which meg 18 gene portion is capable of encoding an MCG18 polypeptide 
or a functional or immunologically interactive derivative thereof. 

Preferably, the mcgl8 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said meg 18 gene portion 
20 in an appropriate cell. 

In addition, the mcgl8 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

25 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

It is proposed in accordance with the present invention that MCG18 is a transcription factor 
30 involved in protein folding, protein complex assembly and transit through subcellular 
compartments. MCG18 may also have a role in tumour suppression. Thus mutations in mcgl8 
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may result in the development of or a propensity to develop various types of cancer. 

i 

A deletion or aberration in the mcgl8 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
5 heterozygous mutation. The detection may occur at the foetal or post-natal level Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents and/or proband of the subject under 
investigation. 

10 According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in meg 18, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcgl8 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 

15 a propensity to develop said condition. 

The nucleotide substitutions, additions or deletions may be detected by any convenient means 
including nucleotide sequencing, restriction fragment length polymorphism (RFLP), polymerase 
chain reaction (PCR), oligonucleotide hybridization and single stranded conformation 
20 polymorphism analysis (SSCP) amongst many others. An aberration includes modification to 
existing nucleotides such as to modify glycosylation signal amongst other effects. 

In an alternative method, aberrations in the mcg4, mcg7 and mcgl8 genes are detected by 
screening for mutations in MCG4, MCG7 and MCG18, respectively. 

25 

A mutation in MCG4, MCG7 or MCG18 may be a single or multiple amino acid substitution, 
addition and/or deletion. The mutation in mcg4, mcg7 or mcgl8 may also result in either no 
translation product being produced or a product in truncated form. A mutant may also be an 
altered glycosylation pattern or the introduction of side chain modifications to amino acid 
30 residues. 



WO 98/53061 



-23- 



PCT/AU98/00380 



According to this aspect of the present invention, there is provided a method of detecting a 
condition caused or facilitated by an aberration in mcg4 t mcg7 or mcgl8 said method comprising 
screening for a single or multiple amino acid substitution, deletion and/or addition to MCG4, 
MCG7 or MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
5 develop said condition. 

A particularly convenient means of detecting a mutation in MCG4, MCG7 or MCG 1 8 is by use 
of antibodies. 

10 Accordingly another aspect of the present invention is directed to antibodies to MCG4, MCG7 
or MCG 18 and its derivatives. Such antibodies may be monoclonal or polyclonal and may be 
selected from naturally occurring antibodies to MCG4, MCG7 or MCG 18 or may be specifically 
raised to MCG4, MCG7 or MCG 18 or derivatives thereof. In the case of the latter, MCG4, 
MCG7 or MCG 18 or their derivatives may first need to be associated with a carrier molecule. 

15 The antibodies to MCG4, MCG7 or MCG 18 of the present invention are particularly useful as 
diagnostic agents. 

For example, antibodies to MCG4, MCG7 or MCG18 and their derivatives can be used to screen 
for wild-type MCG4, MCG7 or MCG 18 or for mutated MCG4, MCG7 or MCG 18 molecules. 

20 The latter may occur, for example, during or prior to certain cancer development. A differential 
binding assay is also particularly useful. Techniques for such assays are well known in the art 
and include, for example, sandwich assays and ELISA. Knowledge of normal MCG4, MCG7 
or MCG18 levels or the presence of wUd-type MCG4, MCG7 or MCG 18 may be important for 
diagnosis of certain cancers or a predisposition for development of cancers or for monitoring 

25 certain therapeutic protocols. 

As stated above antibodies to MCG4, MCG7 or MCG 18 of the present invention may be 
monoclonal or polyclonal or may be fragments of antibodies such as Fab fragments. 
Furthermore, the present invention extends to recombinant and synthetic antibodies and to 
30 antibody hybrids. A "synthetic antibody" is considered herein to include fragments and hybrids 
of antibodies. 
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For example, specific antibodies can be used to screen for wild-type MCG4, MCG7 or MCG18 
molecule or specific mutant molecules such as molecules having a certain deletion. This would 
be important, for example, as a means for screening for levels of MCG4, MCG7 or MCG18 in 
a cell extract or other biological fluid or purifying MCG4, MCG7 or MCG18 made by 
5 recombinant means from culture supernatant fluid or purified from a cell extract. Techniques for 
the assays contemplated herein are known in the art and include, for example, sandwich assays 
and ELISA. 

It is within the scope of this invention to include any second antibodies (monoclonal, polyclonal 
10 or fragments of antibodies or synthetic antibodies) directed to the first mentioned antibodies 
discussed above. Both the first and second antibodies may be used in detection assays or a first 
antibody may be used with a commercially available anti-immunoglobulin antibody. An antibody 
as contemplated herein includes any antibody specific to any region of wild-type MCG4, MCG7 
or MCG18 or to a specific mutant phenotype or to a deleted or otherwise altered region. 

Both polyclonal and monoclonal antibodies are obtainable by immunization of a suitable animal 
or bird with MCG4, MCG7 or MCG18 or its derivatives and either type is utilizable for 
immunoassays. The methods of obtaining both types of sera are well known in the art. 
Polyclonal sera are less preferred but are relatively easily prepared by injection of a suitable 
20 laboratory animal or bird with an effective amount of MCG4, MCG7 or MCG18 or antigenic 
parts thereof or derivatives thereof, collecting serum from the animal or bird, and isolating 
specific sera by any of the known immunoadsorbent techniques. Although antibodies produced 
by this method are utilizable in virtually any type of immunoassay, they are generally less 
favoured because of the potential heterogeneity of the product. 

25 

The use of monoclonal antibodies in an immunoassay is particularly preferred because of the 
ability to produce them in large quantities and the homogeneity of the product. The preparation 
of hybridoma cell lines for monoclonal antibody production derived by fusing an immortal cell 
line and lymphocytes sensitized against the immunogenic preparation can be done by techniques 
30 which are well known to those who are skilled in the art. 
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Another aspect of the present invention contemplates a method for detecting MCG4, MCG7 or 
MCG18 or a derivative thereof in a biological sample said method comprising contacting said 
biological sample with an antibody specific for MCG4, MCG7 or MCG1 8 or its derivatives or 
homologues for a time and under conditions sufficient for an antibody-MCG4, MCG7 or 
5 MCG18 complex to form, and then detecting said complex. 

Preferably, the biological sample is a cell extract from a human or other animal or a bird. 

The presence of MCG4, MCG7 or MCG18 may be accomplished in a number of ways such as 
10 by Western blotting and ELISA procedures. A wide range of immunoassay techniques are 
available as can be seen by reference to US Patent Nos. 4,016,043, 4, 424,279 and 4,018,653. 
These include both single-site and two-site or "sandwich" assays of the non-competitive types, 
as well as traditional competitive binding assays. These assays also include direct binding of a 
labelled antibody to a target. 

15 

Sandwich assays are among the most useful and commonly used assays and are favoured for use 
in the present invention. A number of variations of the sandwich assay technique exist, and all 
are intended to be encompassed by the present invention. Briefly, in a typical forward assay, an 
unlabelled antibody is immobilized on a solid substrate and the sample to be tested brought into 

20 contact with the bound molecule. After a suitable period of incubation, for a period of time 
sufficient to allow formation of an antibody-antigen complex, a second antibody specific to the 
antigen, labelled with a reporter molecule capable of producing a detectable signal is then added 
and incubated, allowing time sufficient for the formation of another complex of antibody-antigen- 
labelled antibody. Any unreacted material is washed away, and the presence of the antigen is 

25 determined by observation of a signal produced by the reporter molecule. The results may either 
be qualitative, by simple observation of the visible signal, or may be quantitated by comparing 
with a control sample containing known amounts of hapten. Variations on the forward assay 
include a simultaneous assay, in which both sample and labelled antibody are added 
simultaneously to the bound antibody. These techniques are well known to those skilled in the 

30 art. including any minor variations as will be readily apparent. In accordance with the present 
invention the sample is one which might contain MCG4, MCG7 or MCG18 including cell extract 
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or tissue biopsy. The sample is, therefore, generally a biological sample comprising biological 
fluid but also extends to fermentation fluid and supernatant fluid such as from a cell culture. 

In the typical forward sandwich assay, a first antibody having specificity for the MCG4, MCG7 
5 or MCG 1 8 or an antigenic part thereof or a derivative thereof or antigenic parts thereof, is either 
covalently or passively bound to a solid surface. The solid surface is typically glass or a polymer, 
the most commonly used polymers being cellulose, polyacrylamide, nylon, polystyrene, polyvinyl 
chloride or polypropylene. The solid supports may be in the form of tubes, beads, discs of 
microplates, or any other surface suitable for conducting an immunoassay. The binding 
10 processes are well-known in the art and generally consist of cross-linking covalently binding or 
physically adsorbing, the polymer-antibody complex is washed in preparation for the test sample. 
An aliquot of the sample to be tested is then added to the solid phase complex and incubated for 
a period of time sufficient (e.g. 2-40 minutes or overnight if more convenient) and under suitable 
conditions (e.g. from room temperature to 37 °C) to allow binding of any subunit present in the 
15 antibody. Following the incubation period, the antibody subunit solid phase is washed and dried 
and incubated with a second antibody specific for a portion of the hapten. The second antibody 
is linked to a reporter molecule which is used to indicate the binding of the second antibody to 
the hapten. 

20 An alternative method involves immobilizing the target molecules in the biological sample and 
then exposing the immobilized target to specific antibody which may or may not be labelled with 
a reporter molecule. Depending on the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by direct labelling with the antibody. 
Alternatively, a second labelled antibody, specific to the first antibody is exposed to the target- 

25 first antibody complex to form a target-first antibody-second antibody tertiary complex. The 
complex is detected by the signal emitted by the reporter molecule. 

By "reporter molecule" as used in the present specification, is meant a molecule which, by its 
chemical nature, provides an analytically identifiable signal which allows the detection of antigen- 
30 bound antibody. Detection may be either qualitative or quantitative. The most commonly used 
reporter molecules in this type of assay are either enzymes, fluorophores or radionuclide 
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containing molecules (i.e. radioisotopes) and chemiluminescent molecules. 
In the case of an enzyme immunoassay, an enzytne is conjugated to the second antibody, 
generally by means of glutaraldehyde or periodate. As will be readily recognized, however, a 
wide variety of different conjugation techniques exist, which arc readily available to the skilled 
5 artisan. Commonly used enzymes include horseradish peroxidase, glucose oxidase, beta- 
galactosidase and alkaline phosphatase, amongst others. The substrates to be used with the 
specific enzymes are generally chosen for the production, upon hydrolysis by the corresponding 
enzyme, of a detectable colour change. Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to employ fluorogenic substrates, which yield a 

10 fluorescent product rather than the chromogenic substrates noted above. In all cases, the 
enzyme-labelled antibody is added to the first antibody hapten complex, allowed to bind, and 
then the excess reagent is washed away. A solution containing the appropriate substrate is then 
added to the complex of antibody-antigen-antibody. The substrate will react with the enzyme 
linked to the second antibody, giving a qualitative visual signal, which may be further quantitated, 

15 usually spectrophotometrically, to give an indication of the amount of hapten which was present 
in the sample. "Reporter molecule" also extends to use of cell agglutination or inhibition of 
agglutination such as red blood cells on latex beads, and the like. 

Alternately, fluorescent compounds, such as fluorescein and rhodamine, may be chemically 
20 coupled to antibodies without altering their binding capacity. When activated by illumination 
with light of a particular wavelength, the fluorochrome-labelled antibody adsorbs the light 
energy, inducing a state to excitability in the molecule, followed by emission of the light at a 
characteristic colour visually detectable with a light microscope. As in the EIA, the fluorescent 
labelled antibody is allowed to bind to the first antibody-hapten complex. After washing off the 
25 unbound reagent, the remaining tertiaiy complex is then exposed to the light of the appropriate 
wavelength the fluorescence observed indicates the presence of the hapten of interest. 
Immunofluorescence and EIA techniques are both very well established in the art and are 
particularly preferred for the present method. However, other reporter molecules, such as 
radioisotope, chemiluminescent or bioluminescent molecules, may also be employed. 

30 

As stated above, the present invention extends to genetic constructs capable of encoding MCG4, 
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MCG7 or MCG18 or functional derivatives thereof. Such genetic constructs are also 
contemplated to be useful in modulating expressioq of specific genes in which mcg4, mcg7 or 
mcgl8 is involved in tissue-specific or temporal regulation. 

5 Accordingly, another aspect of the present invention is directed to a genetic construct comprising 
a nucleotide sequence encoding a peptide, polypeptide or protein and mcg4, mcg7 or meg 18 or 
a functional derivative or homologue thereof capable of modulating the expression of said 
nucleotide sequence. 

10 As stated above, MCG18 is proposed to have a role in tumour suppression. Accordingly, it is 
further proposed in accordance with the present invention to use recombinant MCG18 in 
pharmaceutical preparations for treating arresting or otherwise ameliorating the effects of certain 
cancers. 

15 Accordingly, another aspect of the present invention contemplates a method for treating, 
arresting or otherwise ameliorating the effects of a cancer in an animal or bird, said method 
comprising administering to said animal or bird an effective amount of MCG18 or a functional 
derivative thereof for a time and under conditions sufficient to treat, arrest or otherwise 
ameliorate the effects of said cancer. 

20 

The present invention, therefore, contemplates a pharmaceutical composition comprising 
MCG18 or a derivative thereof or a modulator of mcgl8 expression or MCG18 activity and one 
or more pharmaceutically acceptable carriers and/or diluents. These components are referred 
to hereinafter as the "active ingredients". The active ingredients may also include anti-cancer 
25 agents or agents which facilitate actions of MCG 1 8. 

The pharmaceutical forms suitable for injectable use include sterile aqueous solutions (where 
water soluble) and sterile powders for the extemporaneous preparation of sterile injectable 
solutions. It must be stable under the conditions of manufacture and storage and must be 
30 preserved against the contaminating action of microorganisms such as bacteria and fungi. The 
carrier may be a solvent medium containing, for example, water, ethanol, polyol (for example, 
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glycerol, propylene glycol and liquid polyethylene glycol and the like), suitable mixtures thereof, 
and vegetable oils. The proper fluidity can be maintained, for example, by the use of a coating 
such as licithin and by the use of superfactants. The preventions of the action of microorganisms 
can be brought about by various antibacterial and antifungal agents, for example, parabens, 
5 chlorobutanol, phenol, sorbic acid, thimersal and the like. In many cases, it will be preferable to 
include isotonic agents, for example, sugars or sodium chloride. Prolonged absorption of the 
injectable conpositions can be brought about by the use in the compositions of agents delaying 
absorption, for example, aluminum monostearate and gelatin. 

10 Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as 
required, followed by filtered sterilization. In the case of sterile powders for the preparation of 
sterile injectable solutions, the preferred methods of preparation are vacuum drying and the 
freeze-drying technique which yield a powder of the active ingredient plus any additional desired 

15 ingredient from previously sterile-filtered solution thereof. 

When the active ingredients are suitably protected they may be orally administered, for example, 
with an inert diluent or with an assimilable edible carrier, or it may be enclosed in hard or soft 
shell gelatin capsule, or it may be compressed into tablets, or it may be incorporated directly with 

20 the food of the diet. For oral therapeutic administration, the active compound may be 
incorporated with excipients and used in the form of ingestible tablets, buccal tablets, troches, 
capsules, elixirs, suspensions, syrups, wafers, and the like. Such compositions and preparations 
should contain at least 1% by weight of active compound. The percentage of the compositions 
and preparations may, of course, be varied and may conveniently be between about 5 to about 

25 80% of the weight of the unit. The amount of active compound in such therapeutically useful 
compositions in such that a suitable dosage will be obtained. Preferred compositions or 
preparations according to the present invention are prepared so that an oral dosage unit form 
contains between about 0. 1 ^g and 2000 mg of active compound. 

30 The tablets, troches, pills, capsules and the like may also contain the components as listed 
hereafter. A binder such as gum, acacia, corn starch or gelatin; excipients such as dicalcium 
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phosphate; a disintegrating agent such as corn starch, potato starch, alginic acid and the like; 
a lubricant such as magnesium stearate; and a sweetening agent such a sucrose, lactose or 
saccharin may be added or a flavouring agent such as peppermint, oil of wintergreen, or cherry 
flavouring. When the dosage unit form is a capsule, it may contain, in addition to materials of 

5 the above type, a liquid carrier. Various other materials may be present as coatings or to 
otherwise modify the physical form of the dosage unit. For instance, tablets, pills, or capsules 
may be coated with shellac, sugar or both. A syrup or elixir may contain the active compound, 
sucrose as a sweetening agent, methyl and propylparabens as preservatives, a dye and flavouring 
such as cherry or orange flavour. Of course, any material used in preparing any dosage unit form 

10 should be pharmaceutical^ pure and substantially non-toxic in the amounts employed. In 
addition, the active compound(s) may be incorporated into sustained-release preparations and 
formulations. 

The present invention also extends to forms suitable for topical application such as creams, 
15 lotions and gels. 

Pharmaceutically acceptable carriers and/or diluents include any and all solvents, dispersion 
media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents and 
the like. The use of such media and agents for pharmaceutical active substances is well known 
20 in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredient, use thereof in the therapeutic compositions is contemplated. Supplementary active 
ingredients can also be incorporated into the compositions. 

It is especially advantageous to formulate parenteral compositions in dosage unit form for ease 
25 of administration and uniformity of dosage. Dosage unit form as used herein refers to physically 
discrete units suited as unitary dosages for the mammalian subjects to be treated; each unit 
containing a predetermined quantity of active material calculated to produce the desired 
therapeutic effect in association with the required pharmaceutical carrier. The specification for 
the novel dosage unit forms of the invention are dictated by and directly dependent on (a) the 
30 unique characteristics of the active material and the particular therapeutic effect to be achieved, 
and (b) the limitations inherent in the art of compounding such an active material for the 



WO 98/53061 



-31 - 



PCT/AU98/00380 



treatment of disease in living subjects having a diseased condition in which bodily health is 
impaired as herein disclosed in detail. 1 

The principal active ingredient is compounded for convenient and effective administration in 
5 effective amounts with a suitable pharmaceutically acceptable carrier in dosage unit form as 
hereinbefore disclosed. A unit dosage form can, for example, contain the principal active 
compound in amounts ranging from 0.5 jig to about 2000 mg. Expressed in proportions, the 
active compound is generally present in from about 0.5 jig to about 2000 mg/ml of carrier. In 
the case of compositions containing supplementary active ingredients, the dosages are 
10 determined by reference to the usual dose and manner of administration of the said ingredients. 

Effective amounts contemplated by the present invention include those amounts effective to 
ameliorate a condition. For example, it is envisaged that effective amounts would range from 
about 0.001 /ig/kg body weight to about 100 mg/kg body weight. Alternatively, effective 
15 amounts of about 0.01 Mg/kg body weight to about 10 mg/kg body weight or even 0. 1 /ig/kg 
body weight to about 1 mg/kg body weight. Administration may be per minute, hour, day, week, 
month or year or may only be a once off administration. 

The pharmaceutical composition may also comprise genetic molecules such as a vector capable 
20 of transfecting target cells where the vector carries a nucleic acid molecule capable of modulating 
meg 18 expression or MCG18 activity. The vector may, for example, be a viral vector. 

As stated above, the present invention further contemplates a range of derivatives of MCG18. 

Derivatives include fragments, parts, portions, mutants, homologues and analogues of the 
25 MCG18 polypeptide and corresponding genetic sequence. Derivatives also include single or 

multiple amino acid substitutions, deletions and/or additions to MCG18 or single or multiple 

nucleotide substitutions, deletions and/or additions to the genetic sequence encoding MCG18. 

"Additions" to amino acid sequences or nucleotide sequences include fusions with other 

peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference herein to 
30 "MCG18" includes reference to all derivatives thereof including functional derivatives or MCG18 

immunologically interactive derivatives. 
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Analogues of MCG18 contemplated herein include, but are not limited to, modification to side 
chains, incorporating of unnatural amino acids j and/or their derivatives during peptide, 
polypeptide or protein synthesis and the use of crosslinkers and other methods which impose 
conformational constraints on the proteinaceous molecule or their analogues. 

5 

Examples of side chain modifications contemplated by the present invention include 
modifications of amino groups such as by reductive alkylation by reaction with an aldehyde 
followed by reduction with NaBIfy; amidination with methylacetimidate; acylation with acetic 
anhydride; caibamoylation of amino groups with cyanate; trinitrobenzylation of amino groups 
10 with 2, 4, 6-trinitrobenzene sulphonic acid (TNBS); acylation of amino groups with succinic 
anhydride and tetrahydrophthalic anhydride; and pyridoxylation of lysine with pyridoxal-5- 
phosphate followed by reduction with NaBIfy. 

The guanidine group of arginine residues may be modified by the formation of heterocyclic 
15 condensation products with reagents such as 2,3-butanedione, phenylglyoxal and glyoxal. 

The carboxyl group may be modified by carbodiimide activation via O-acylisourca formation 
followed by subsequent derivitisation, for example, to a corresponding amide. 

20 Sulphydryl groups may be modified by methods such as carboxymethylation with iodoacetic acid 
or iodoacetamide; performic acid oxidation to cysteic acid; formation of a mixed disulphides 
with other thiol compounds; reaction with maleimide, maleic anhydride or other substituted 
maleimide; formation of mercurial derivatives using 4-chloromercuribenzoate, 4- 
chloromercuriphenylsulphonic acid, phenylmercury chloride, 2-chloromercuri-4-nitrophenol and 

25 other mercurials; carbamoylation with cyanate at alkaline pH. 

Tryptophan residues may be modified by, for example, oxidation with N-bromosuccinimide or 
alkylation of the indole ring with 2-hydroxy-5-nitrobenzyl bromide o* sulphenyl halides. 
Tyrosine residues on the other hand, may be altered by nitration with tetranitromethane to form 
30 a 3-nitrotyrosine derivative. 
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Modification of the imidazole ring of a histidine residue may be accomplished by alkylation with 
iodoacetic acid derivatives or N-carbethoxylation with diethylpyrocarbonate. 

Examples of incorporating unnatural amino acids and derivatives during peptide synthesis 
5 include, but are not limited to, use of norleucine, 4-amino butyric acid, 4-amino-3-hydroxy-5- 
phenylpentanoic acid, 6-aminohexanoic acid, t-butylglycine, norvaline, phenylglycine, ornithine, 
sarcosine, 4-amino-3-hydroxy-6-methylheptanoic acid, 2-thienyl alanine and/or I>-isomers of 
amino acids. A list of unnatural amino acids, contemplated herein is shown in Table 3. 
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TABLE 3 



Mr\n-r*nnvf*ntinna1 
sminn nf*if! 


Code 


Non-conventional 
amino acid 


Code 


rf-arninnbiitvric acid 


Abu 


L-N-methylalanine 


Nmala 


rt.ami no- ft-methvlbutvrate 


M$?abu 


L-N-methylarginine 


Nmarg 


aimnocycioprupdne- 




T -N-rnpthvlaQnaratfine 


Nmasn 


r*»rrv"iYv1atP 
Col UWAy laxv 




L-N-methvlasoartic acid 


Nmasp 


1ft aminr*icrtHntvrir* acid 


Aib 


L-N-methvlcvsteine 


Nmcys 


rami nnnnrnnmvl- 
cUILulUUUl UKJlliy I 


Norb 


1 ^-N-methvlplutamine 


Nmgln 


r*a rH/> vvl at#* 




L-N-methvl glutamic acid 


Nmglu 




Chexa 


L,-N-methvlhistidine 


Nmhis 


wyciupciiiyirtinuiiic 


Pnen 


I ^-N-methvlisolleucine 


Nmile 




Dal 


L-N-methvlleucine 


Nmleu 




Dare 


L-N - meth v 11 vsine 


Nmlys 


D-asnartic acid 


Dasp 


L-N-methylmethionine 


Nmmet 


Fi-cvQtpinp 


Dcys 


L-N-methylnorleucine 


Nmnle 


D-olntaminf* 


Ddn 


L-N-methylnorvaline 


Nmnva 


20 D-plutamic acid 


Delu 


L-N-methylorni thine 


Nmorn 


D-histidine 


Dhis 


L-N-methylphenylalanine 


Nmphe 


D-i^nlpncine 


Dile 


L-N-methylproline 


Nmpro 




Dleu 


L-N-methvlserine 


Nmser 


LMyswe 




T -M-mf*thvlthrpnninf* 


Nmthr 


25 D-methionine 


Dmet 


L-N-methyltryptophan 


Nmtrp 


D-omithine 


Dorn 


L-N-methyltyrosine 


Nmtyr 


D-phenylalanine 


Dphe 


L-N-methylvaline 


Nmval 


D-proline 


Dpro 


L-N-methylethylglycine 


Nmetg 


D-serine 


Dser 


L-N-methyl-t-butylglycine 


Nntbug 


30 D-threonine 


Dthr 


L-norleucine 


Nle 


D-tryptophan 


Dtrp 


L-norvaline 


Nva 
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D-tyrosine 
D-valine 

D-a-methylalanine 
D- a-methylarginine 
5 D-a-methylasparagine 
D-a-methylaspartate 
D-a-methylcysteine 
D-a-methylglutamihe 
D-a-methylhistidine 
10 D-a-methylisoleucine 
D-a-methylleucine 
D-a-methyllysine 
D-a-methylmethionine 
D-a-methylomithine 
15 D-a-methylphenylalanine 
D-a-methylproline 
D-a-methylserine 
D-a-methylthreonine 
D-a-methyltryptophan 
20 D-a-methyltyrosine 
D-a-methylvaline 
D-N-methylalanine 
D-N-methylarginine 
D-N-methylasparagine 
25 D-N-methylaspartate 
D-N-methylcysteine 
D-N-methylglutamine 
D-N-methylglutamate 
D-N-methylhistidine 
30 D-N-methylisoleucine 
D-N-methylleucine 
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Dtyr 


a-methyl-aminoisobutyrate 


Maib 


Dval 


aWthyl-Y-aminobutyrate 


Mgabu 


Dmala 


a-methylcyclohexylalanine 


Mchexa 


Dmarg 


a-methylcylcopentylalanine 


Mcpen 


Dmasn 


a-methyl-a-napthylalanine 


Man an 


Dmasp 


a-methylpenicillamine 


Mnen 


Dmcys 


N-(4-aminobutyl)glycine 


Nelu 


Dmgln 


N-(2-aminoethyl)glycine 




Dmhis 


N-(3-aminopropyl)glycine 


Nom 


Dmile 


N-amino-a-methylbutyrate 


NmaaHi 


Dmleu 


a-napthylalanine 


Anan 


Dmlys 


N-benzylglycine 


Nphe 


Dmmet 


N-(2^aibamylethyl)glycine 


Neln 


Dmom 


N-<carbamylmethyl)glycine 


Nasn 


Dmphe 


N-(2-carboxyethyl)glycine 


Nglu 


Dmpro 


N-(carboxymethyl)glycine 


Nasp 


Dmser 


N-cyclobutylglycine 


Ncbut 


Dmthr 


N-cycloheptylglycine 


Nchep 


Dmtrp 


N-cyclohexylglycine 


Nchex 


Dmty 


N-cyclodecylglycine 


Ncdec 


Dmval 


N^ylcododecylglycine 


Ncdod 


Dnmala 


N-cyclooctylglycine 


Ncoct 


Dnmarg 


N-cyclopropylglycine 


Ncpro 


Dnmasn 


N-cycloundecylglycine 


Ncund 


Dnmasp 


N-(2,2-diphenylethyl)glycine 


Nbhm 


Dnmcys 


N-(3,3-diphenylpropyl)glycine 


Nbhe 


Dnmgln 


N-(3-guanidinopropyl)glycine 


Narg 


Dnmglu 


N-( 1 -hydroxyethyl)glycine 


Nthr 


Dnmhis 


N-(hydroxyethyi))glycine 


Nser 


Dnmile 


N-(imidazolylethyI))glycine 


Nhis 


Dnmleu 


N-(3-indolylyethyl)glycine 


Nhtrp 
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D-N-methyllysine 


Dnmlys 


N-methy 1- y -aminobutyrate 


Nnqgabu 


N-methylcyclohexylalanine 


Nmchexa 


D-N-methylmethionine 


Dnmmet 


D-N-methylornithine 


Dnmorn 


N-methylcyclopentylalanine 


Nmcpen 


N-methylglycine 


Nala 


D-N-methylphenylalanine 


Dnrrphe 


5 N-methylaminoisobutyrate 


Nmaib 


D-N-methylproline 


Dnmpro 


N-( l-methylpropyl)glycine 


Nile 


D-N-methylserine 


Dnmser 


N-(2-methylpropyl)glycine 


Nleu 


D-N-methylthrconine 


Dnmthr 


D-N-methyltryptophan 


Dnmtrp 


N-( 1 -methy lethyl)glycine 


Nval 


D-N-methyltyrosine 


Dnmtyr 


N-methyla-napthylalanine 


Nmanap 


10 D-N-methylvaline 


Dnmval 


N-methylpenicillamine 


Nmpen 


y-aminobutyric acid 


Gabu 


N-(p-hydroxyphenyl)glycine 


Nhtyr 


L-r-butylglycine 


Tbug 


N-(thiomethyl)glycine 


Ncys 


L-ethylglycine 


Etg 


penicillamine 


Pen 


L-homophenylalanine 


Hphe 


L-a-methylalanine 


Mala 


15 L-a-methylarginine 


Marg 


L-a-methylasparagine 


Masn 


L- a-me thy laspartate 


Masp 


I^a-methyl-f-butylglycine 


Mtbug 


L-a-methylcysteine 


Mcys 


L-methylethylglycine 


Metg 


L-a-methylglutamine 


Mgln 


L-a-methylglutamate 


Mglu 


L-a-methylhistidine 


Mhis 


L-a-methylhomophenylalanine 


Mhphe 


20 L-a-methylisoleucine 


Mile 


N-(2-methylthioethyl)glycine 


Nmet 


L-a-methylleucine 


Mleu 


L-a-methyllysine 


Mlys 


L-a-methylmethionine 


Mmet 


L-a-methylnorieucine 


Mnle 


L-a-methylnorvaline 


Mnva 


L-a-methylornithine 


Morn 


L-a-methylphenylalanine 


Mphe 


L-a-methylproline 


Mpro 


25 L-a-methylserine 


Mser 


L-a-methylthreonine 


Mthr 


L-a-methyltryptophan 


Mtrp 


L-a-methyltyrosine 


Mtyr 
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L-a-methylvaline Mval 
N-(N-(2,2-diphenylethyl) Nnbhm 
carbamylmethyl)glycine 
1 -carboxy- 1 -(2,2-diphenyl- Nmbc 
5 ethylamino)cyclopropane 



L-N-methylhomophenylalanine Nnfyte 
N-(N-(3,3-diphenylpropyl) Nnbhe 
carbamyimethyl)glycine 



Crosslinkers can be used, for example, to stabilise 3D conformations, using homo-bifunctional 
crosslinkers such as the bifunctional imido esters having (CH2> n spacer groups with n=l to n=6, 

10 glutaraldehyde, N-hydroxysuccinimide esters and hetero-bifiinctional reagents which usually 
contain an amino-reactive moiety such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maleimido or dhhio moiety (SH) or carbodiimide (COOH). In addition, 
peptides can be conformationally constrained by, for example, incorporation of C a and - 
methylamino acids, introduction of double bonds between C a and C p atoms of amino acids and 

15 the formation of cyclic peptides or analogues by introducing covalent bonds such as forming an 
amide bond between the N and C termini, between two side chains or between a side chain and 
the N or C terminus. 

Such analogues also apply in respect of MCG4 and MCG7. 

20 

The present invention further contemplates chemical analogues of MCG1 8 capable of acting as 
antagonists or agonists of MCG18 or which can act as functional analogues of MCG 18. 
Chemical analogues may not necessarily be derived from MCG18 but may share certain 
conformational similarities. Alternatively, chemical analogues may be specifically designed to 
25 mimic certain physiochemical properties of MCG18. Chemical analogues may be chemically 
synthesised or may be detected following, for example, natural product screening. 

The identification of MCG' 8 permits the generation of a range of therapeutic molecules capable 
of modulating expression of MCG18 or modulating the activity of MCG18. Modulators 
30 contemplated by the present invention includes agonists and antagonists of MCG 1 8 expression. 
Antagonists of MCG 18 expression include antisense molecules, ribozymes and co-suppression 
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molecules. Agonists include molecules which increase promoter ability or interfere with negative 
regulatory mechanisms. Agonists of MCG18 include molecules which overcome any negative 
regulatory mechanism. Antagonists of MCG18 include antibodies and inhibitor peptide 
fragments. 

5 

These types of modifications may be important to stabilise MCG18 if administered to an 
individual or for use as a diagnostic reagent. 

Other derivatives contemplated by the present invention include a range of glycosylation variants 
10 from a completely unglycosylated molecule to a modified glycosylated molecule. Altered 
glycosylation patterns may result from expression of recombinant molecules in different host 
cells. 

Another embodiment of the present invention contemplates a method for modulating expression 
15 of MCG18 in a human, said method comprising contacting the mcgl8 gene encoding MCG18 
with an effective amount of a modulator of meg 18 expression for a time and under conditions 
sufficient to up-regulate or down-regulate or otherwise modulate expression of mcgl8. For 
example, a nucleic acid molecule encoding MCG18 or a derivative thereof may be introduced 
into a cell to facilitate protection of that cell from becoming cancerous. 

20 

Another aspect of the present invention contemplates a method of modulating activity of MCG18 
in a human, said method comprising administering to said mammal a modulating effective amount 
of a molecule for a time and under conditions sufficient to increase or decrease MCG18 activity. 
The molecule may be a proteinaceous molecule or a chemical entity and may also be a derivative 
25 of MCG 18 or a chemical analogue or truncation mutant of MCG18. 

The present invention is further described with reference to the following non-limiting Examples. 
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EXAMPLE 1 

i 



A human gene (designated mcg4) was identified on chromosome 1 lql3 that on the basis of 
sequence homology is predicted to encode a putative transcription factor of 3 10 amino acids 
5 (Fig. 1). mcg4 is transcribed in several different cell lines (Fig. 7). 



EXAMPLE 2 



The expressed sequence tag (EST) database contains partial sequence data for the murine (Fig. 
10 2) and nematode (Fig. 3) homologues of mcg4. 



EXAMPLE 3 



MCG4 contains a sequence of cysteine residues within the N-terminal region of the protein that 
15 resembles zinc-finger binding domains of a novel type, ie. (HC 3 ) 2 [Fig. 4]. 



EXAMPLE 4 



Sensitive sequence homology searches reveal that related cysteine-containing motifs are present 
20 in another C. elegans protein (Fig. 5) as well as the GATA-binding transcription factor from S. 
pombe (Fig. 6). 



EXAMPLES 



25 mcg4 will have commercial value due to its likelihood of encoding a novel transcription factor 
that is highly conserved amongst organisms, thus suggesting an integral role in gene regulation. 
mcg4 may also be involved in some way in tissue-specific or temporal regulation of certain genes, 
thus making it a potential target for modulating expression of those downstream effectors. 



30 
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EXAMPLE 6 

Nucleotide sequence data generated from cosmid clone cSRL-72c4 with the T7 primer 
(Proirega, and Applied Biosystems Incorporated dye terminator sequencing kit) was aligned to 

5 the GenBank Expressed Sequence Tag (EST) database using the program BLASTN (Altschul 
et al 1990) and was found to match numerous human and mouse entries (Table 4 and Figure 2). 
These matching ESTs were further used to identify overlapping entries in the EST database 
(Table 5). The nucleotide sequences of these human ESTs were complied using Mac Vector 
4.2.1 software (IBI-Kodak) to produce the cDNA sequence shown in Figure 1. EST entries 

10 AA074703 and AA1 34788 are closely related at the nucleotide level to mcg4 and it is, therefore, 
likely that mcg4 is a member of a newly discovered gene family (Figure 8), 

The cDNA sequence of mcg4 was translated in all possible reading frames and compared to the 
GenBank non-redundant protein database using the program BLASTX (Altschul et al, 1990) at 

15 the National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). As the 
protein appeared to be novel, a translation of the longest reading frame for the mcg4 cDN A was 
aligned to the EST database using the program TBLASTN, which performed a dynamic 
translation of the EST database in all 6 frames. The search results indicated that the nematode 
C elegans had an MCG4-like protein (Figure 3), with the matching domains containing a spatial 

20 sequence of Cysteine and Histidine residues which resembled a zinc-finger structure (Figure 4). 
The program BLASTP was used, therefore, to conduct sensitive searches of the protein 
databases for similar zinc-finger motifs. A weak match to the putative zinc-finger domain was 
observed for another protein from C. elegans (Figure 5) and a poorer match for the GATA- 
binding transcription factor from 5. pombe (Figure 6). The putative initiation codon of human 

25 mcg4 is not preceded by an in-frame stop codon and it is therefore possible that the cDNA 
described in Figure 1 is a truncated form. However, sequence alignment of human and mouse 
mcg4 ESTs showed a lower degree of nucleotide conservation prior to the assigned initiation 
codon, thus supporting the notion that the region represents the 5' UTR (Figure 9). To 
determine the expression pattern of mcg4, 15/zg of the total cellular RNA (RNeasy Mini Kit, 

30 Qiagen) from various human cell lines grown in culture were electrophoresed through 1 .2% w/v 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
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using 20 x SSC (Sambrook et al, 1989). Filters were subsequently UV-fixed and hybridised 
overnight at 65'C to a radiolabeled (»P-dCTP) eDNA probe (Church and Gilbert, 1984) for 
mc g 4. After washes in 0. 1 x SSC/0. 1% w/v SDS at 65«C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcg4 is expressed as a 1.6kb 
5 message in numerous tissues including breast, ovary, bladder, lung and keratinocytes (Figure 7). 



EXAMPLE 7 



a 



A human gene (designated mcgT) was identified and isolated from chromosome Mql3 which 
10 encodes a protein that bears striking homology with guanine nucleotide exchange factors (GEFs) 
from a wide variety of organisms (Fig. 1 2). 



EXAMPLE 8 

15 The composite mcg7 cDNA sequence is at least 2.4kb in length and Figure 13(a) shows 
predicted translation product of at least 609 amino acids beginning at methionine 120. An 
alternative start site due to alternate exon splicing (indicated in lower case) may yield a protein 
of 671 amino acids starting at methionine 58 (Fig. 13a). 

20 EXAMPLE 9 

An mcg7 homologue from C elegans has been identified, the product of which is highly 
conserved with that of MCG7 (Fig. 14). There are several salient features of the protein which 
have been underlined in Fig. 14 - namely: a guanine nucleotide binding region, a diacylglycerol 
25 binding region, and "EF-hand^alcium binding regions. In addition, there are several potential 
cAMP. protein kinase C, and casein kinase II phosphorylation sites, as well as a number of 
potential sites for glycosylation (not indicated). 

EXAMPLE 10 

30 

A number of partial human and murine EST clones exist for mc*7. The GenBank database 
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contains a cDNA (Acc. no. Y12336) encoding a full-length open reading frame (ORF) for human 
mcg7 as well as a partial murine mcg7 ORF (Y 12339). In addition, the complete genomic 
sequence of the human mcg7 gene is contained within GenBank entry AC000134. 

5 EXAMPLE 11 

The best characterised GEFs are numbers of the family of ras oncoproteins, which play a pivotal 
role in signal transduction and when mutated are responsible for tumour development. A variety 
of therapeutic reginss for cancer treatment have been designed to specifically interfere with the 
10 ras signalling pathways. There is potential, therefore that the product of mcg7 could also be a 
target for such clinical strategies. 

EXAMPLE 12 

15 The nucleotide sequence for mcg7 cDNA was extended 5* with genomic DNA sequence from 
Genbank accession number AC000134 (positions 1-321) and analysed for additional coding 
sequence 5" to the putative initiation codon (nt 68 1-683) (Fig. 16). An additional in-frame ATG 
occurs at position nt 495-497 when the alternatively splice exon (position nt 504-609) is present 
(also shown in Fig. 13(a)). This closely matches the Kozak consensus. When this exon is 

20 absent, then the ATG is not in-frame and other possible initiation codons are absent (resulting 
translation shown in lower case lettering) (also shown in Fig. 13(b)). Further evidence that the 
initiation codon at position nt 681-683 is the true initiation site is given in Figure 15. 

Alignment of human and a partial murine mcg7 cDNA sequences is shown in Figure 15. The 
25 putative initiation codon is at position nt 360-362. Both murine ESTs appear to have an 
upstream in-frame stop codon at position nt 326-328, downstream of the differentially spliced 
exon and the sequence alignment thus suggests that this region represents the 5' UTR of meg 7. 

Furthermore, similarity with the C elegans homologue strongly suggest that the ATG codon at 
30 position nt 360-362 encodes the N-terminus of MCG7. 
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EXAMPLE 13 

| 

Figure 17 shows data from experiments indicating that a truncated version of MCG7 when 
expressed as a GST fusion protein (construct B in Fig. 18) can function as a Ras-guanine 
5 nucleotide exchange factor. In brief, Ras (unprocessed and as a GST fusion protein) is loaded 
with 'H-GDP then incubated in the presence of excess cold GTP ± GST-MCG7. Full details of 
this assay can be found in Porfiri et al 

EXAMPLE 14 

10 

Nucleotide sequence data generated from cosmid clone cSRL-20hl2 with the T7 primer 
(Promega, and Applied Biosystems Incorporated dye terminator sequencing kit) were aligned 
to the GenBank Expressed Sequence Tag (EST) database using the program BLASTN ( Altschul 
et al t 1990) and was found to match GenBank entries T78563 (clone 1 13434) TO9103 (clone 
15 HBBP12) and AA035643 (clone 471819). EST clones 1 13434 and 471819 were obtained from 
Genome Systems Inc. and these DNAs were sequenced on both strands with gene-specific 
primers (Table 5) to generate the cDNA sequence of mcgl shown in Figures 13(a) and (b). 

The cDNA sequence of mcgl was translated in all possible reading frames and compared to the 
20 GenBank non-redundant protein database using the program BLASTX (Altschul et al f 1 990) and 
the coding region was assigned on the basis of showing homology to the C. elegans protein 
F25B3.3 (Figure 14). The mcgl cDNA composite was suspected to contain a single nucleotide 
error that originated from clone 471819 and the correct nucleotide sequence was, therefore, 
sought by reverse transcription-polymerase chain reaction (RT-PCR) of the cDNA fragment 
25 from a human cDNA pool Total RNA was extracted from a human lymphoblastoid cell line 
using an RNeasy Mini Kit (Qiagen). cDNA synthesis was conducted with the reverse 
transcriptase Superscript H RNaseH- (GBCO, BRL) and random hexamers using the procedure 
recommended by the manufacturer (GIBCO, BRL). One fortieth of the cDNA mix was 
subjected to 35 cycles of PCR using the following cycling conditions: 94°C for 30 seconds, 58°C 
30 for 30 seconds and 72°C for 90 seconds. The 50/zl reaction mix consisted of lx reaction buffer 
(Dade Scientific), 2mM dNTP mix, 20pmol of primers (see Table 6) MCG7UF (within the 
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variably spliced exon of Figure 13(b), between nucleotide positions 184-201) and SGCADRV2 
(between nucleotide positions 866-846 of Figure 13(a)) and 10 units of Dynazyme (Dade 
Scientific). The resulting PCR product was cloned into the pGEM-T vector (Promega) using 
standard methodology and sequenced using gene-specific primers. The correct nucleotide 
5 sequence of mcgl (as shown in Figure 13(a)) matches that of the recently release GenBank entry 
Y 12336. A partial mouse mcgl cDNA sequence can also be found in GenBank entry Y 12339. 

EXAMPLE 15 

10 The coding sequence of mcgl was cloned into vectors for expression in both bacterial and 
mammalian cells. In addition to the full-length constructs, the deletion constructs shown in 
Figure 18 were designed to retain the guanine nucleotide exchange (GEF) domain. For 
prokaryotic expression, the mcgl coding region was inserted downstream of and in-frame with 
the Sj26 cassette of the pGEX (Pharmacia) series of vectors (Smith and Johnson, 1988) using 

15 standard cloning techniques (Sambrook et al, 1989). For mammalian expression, the mcgl 
coding sequence was first myc-tagged at the N-terminus and then ligated into the expression 
vector pc Exv-n using standard cloning techniques. Ligation junctions of the constructs were 
sequences as the cloning strategies inadvertently changed or introduced additional amino acids 
as shown below. 

20 

Construct (A): EST clone 1 13434 was digested with Apal (Figure 13(a), nucleotide positions 
1022 to >2416 (within the vector)), blunt-ended with T4 DNA polymerase according to the 
specifications of the manufacturer (New England Biolab) and ligated into the Smal site of pGEX- 
3X. 

25 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-3X mcg7(1022) 
Sj26 ... GGG ATC CCCiUQSIC [SEQ ID NO: 19] 

additional amino acids Gly He Pro 

30 

Construct (B): EST clone 113434 was digested with EcoRI (Figure 13(a), nucleotide 
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positions <695 (within the vector) to 171 1) and ligated into the EcoRl site of pGEX-1. 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-1 mcgl (695) 

5 Sj26 ... GAA TTC GGC ACG A GCCGA CGG [SEQ ID NO:20] 

additional amino acids Glu Phe Gly Thr Ser 

Construct (C): full-length mcgl: The pGEM-T clone containing the 5' end of the mcgl coding 
region was digested with Apal (subsequendy blunt-ended with T4 DNA polymerase) and BstXl 
10 to liberate the fragment between nucleotide positions 336 and 830 of Figure 13(a). Clone 
1 13434 was digested with BstXl and Hindlll (vector derived) to liberate a fragment between 
nucleotide positions 830 > and 2416 (vector derived) of Figure 13(a). A pGEM-1 lzf vector 
(Promega) containing the myc-tag was digested with Apal (subsequendy blunt-ended with T4 
DNA polymerase) and Hindm, and ligated with the 2 inserts described above. 

15 

Sequence of the myc-tag/mcg7 junction [SEQ ID NOs:21/22]: 

myc-tag vector BamHl mcg7 5' UTR (337) start 

ATGGAGC AG AAGCTG ATCTC CG AGG AGGACCTG CCCGGGGCAGCTggatCcG CAGCCCACCCCGCGCCGGCGGCCATG 
20 MEQKLISEEDL PGAAGS AAHPAPAAM 

additional amino acids 

The myc-tagged full-length mcgl insert in pGEM-1 lzf was then excised with Sacl and HindBl 
(both vector derived) and directionally cloned into the mammalian expression vector pEXV 
25 (Beranger etal, 1994). 

Construct (D): Construct (C) in pGEM- 1 lzf was sequentially digested with HindHl (this site 
was subsequently blunt-ended with T4 DNA polymerase) then BamHl, and ligated into pGEX- 
2T digested with BamHl and Smal. Digestion with BamHl, and ligated into pGEX-2T digested 
30 with BamHl and Smal. Digestion with BamHl removed the myc-tag of Construct (C). 



Sequence of the pGEX and mcgl [SEQ ID NO:23/24] (underlined) junction: 
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pGEX-2 BamHI mcgl (337) 

Sj26 ... gga tCC GCA GCC CAC CCC GCG CCG GCG GCC ATG 
Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 

additional amino acids 



EXAMPLE 16 



Overnight bacterial cultures containing the pGEX plasmid were used to inoculate 500ml of Luria 
Broth media containing 50//g/ml ampicillin. The cultures were grown to an OD of -0.8 and then 

10 induced with ImM of IPTG for up to 3 hours at 37°C. The bacteria were pelleted and 
resuspended in 15 ml of STE buffer (lOmM Tris pH 8.0, 150 mM NaCl and ImM EDTA) with 
1 mg/ml lysozyme. The mixture was left on ice for more than 1 hour and subsequent steps were 
performed at 4°C. Protease inhibitors aprotinin, pepstatin and leupeptin were added at final 
concentrations of 25 Mg/ml, prior to the addition of Triton-X-100 (2% v/v final) and n-lauroyl 

15 sarcosine (1.5% w/v final). The lysate was sonicated for -1 minute and pelleted at 14,000 x g 
for 15 minutes. 100 \i\ of 50% w/v glutathione-sephadex bead slurry (in PBS) was added per 
ml of supernatant. Following a 30 minute incubation at 4°C, the beads were washed three times 
with NETN (20mM Tris-HCl pH 8.0, lOOmM NaCl, ImM EDTA, 0.5% NP40), once with 
NETN-HS (equivalent to NETN but with 1M NaCl), and once in NETN. The bound protein 

20 was directly analysed by SDS-polyacrylamide gel electrophoresis (PAGE) as described below 
or the bound protein was eluted from the beads with the following elution buffer (50mM Tris pH 
8.0, 150mM NaCl, 5mM MgCl 2 , ImM DTT, lOmM reduced glutathione) for use in GDP release 
assays. 

25 

EXAMPLE 17 

Twenty microlitres of GST-sepharose-bound MCG7 were added to an equal volume of 2 x 
30 sample loading dye (lOOmM Tris pH6.8, 2% v/v mercaptoethanol, 4% w/v SDS, 0.2% w/v 
bromophenol blue, 20% v/v glycerol), boiled for 5 min and loaded onto a 7.5% w/v SDS-PAGE 
gel (Sambrook et al, 1989). The Coomassie brilliant blue stained gel (Sambrook et al, 1989) 
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typically displayed a protein doublet, running between 87-95 kDa consisting of the MCG7-GST 
fusion and a slightly smaller, co-purified contaminating E. coli protein of ~105kDa. The 
calculated molecular weight of full-length MCG7 is 77.5 kDa (Construct (D)) and the GST 
component has a molecular weight of 26kDa, hence, the recombinant protein runs slightly 
5 smaller than predicted. A Western blot of the same gel probed with anti-GST antibody yields 
an MCG7-specific band at the same position as that of the stained gel. 

EXAMPLE 18 

10 Assumptions: (a) GST-Ras molecular weight = 50 kD; (b) Concentration of GST-Ras solution 
= lmg/ml = 20/zM; (c) [ 3 H]-GDP is lmCi/ml and 13.3Ci/mmol, therefore [ HJ-GDP 
concentration = 75 /xM and lpmol [ 3 H]-GDP= 15,466 cpm; (d) Elution buffer = Buffer E = 20 
mM Tris-Cl, pH7.5; 50mM NaCl; 5mM MgCl 2 ; ImM DTT (added just before use). Buffer E 
+ BSA= Buffer E+ lmg/ml BSA (added just before use). 

15 

Mix together, in the following order and mix well after each addition: 
10//1 (=10//g) GST-Ras (@ lmg/ml in Buffer E), 463/zl Buffer E + BSA, 7//1 [ 3 H]-GDP, 10ml 
490 (jM EDTA. Incubate @ RT for 10 min. Add 10/il 0.5 M MgCl 2 and mix well. Incubate 
@ RT for 10 min. Place on ice. During the first incubation the excess EDTA concentration is 
20 5mM, during the second incubation the excess Mg concentration is 5mM. The [ 3 H]-GDP 
concentration is IfM and the final concentration of GST-Ras is 400nM. Thus 20ml of the final 
mix will contain 8pmol of GST-Ras protein. Specific activity of GDP is 15,446 cpm/pmol x 
(1/1.4)= 1 1,047 cpm/pmol. 

25 EXAMPLE 19 

Exchange Ras with labelled GDP as above. Add unlabelled GTP (stock = lOOmM, pH7) to 1 
mM. Adjust Mg concentration by adding 5fil 0.5 EDTA to labelled Ras, 5//1 0.5M EDTA to 
500/^1 MCG7, and 5/A 0.5M EDTA to 500/zl Buffer E + BSA. On ice set up microfuge tubes 
30 with 40/zl Ras-GDP (in triplicate) with 40^1 MCG7 or Buffer E + BSA (control). Transfer tubes 
to heat block @ 25°C and incubate for 10, 20 or 30 min. Stop exchange reactions with 1ml of 
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ice cold buffer E and place on ice. Pre-soak nitrocellulose filters, pore size 45/zm, in Buffer E. 
Assemble the vacuum manifold apparatus (Millipoje) with wet filters and plug the wells with 
rubber bunds. Switch on the vacuum pump. Remove the first plug, aliquot the sample and once 
it has been sucked through, wash the filter with 10ml of ice cold Buffer E. Remove next plug 
5 etc and continue round the manifold. Take manifold apart. Pin the filters to a pin board reserved 
forf 3 !!]. Air dry. Take up in 4ml scintillation fluid and count. These studies have been carried 
out with a tmncated MCG7-GST fusion protein (amino acids 341 of Figure 13a to stop encoded 
within construct B). 

10 EXAMPLE 20 

A human gene was identified from chromosome 1 lql3 that encodes a new member of the DnaJ 
family of proteins (designated MCG18). This gene (mcgl8) is expressed as an ~1.4kb mRNA 
(Fig. 28) and is predicted to encode a 241 amino acid product (Fig. 19). 

15 

EXAMPLE 21 

MCG18 has partial homology to E. coli dnaJ and other human DnaJ family members in that it 
contains the J domain (Fig. 20). 

20 

EXAMPLE 22 

MCG18 has greatest homology to functionally undefined proteins from C. elegans (Fig. 21) and 
S. pombe (Fig. 22) that also feature the J domain but maintain sequence similarity through the 
25 central and C-terminal regions of the proteins* 

EXAMPLE 23 

The J domain is proposed to mediate interaction with heat shock protein (Hsp70) 70 and consist 
30 of some 70 amino acids, frequently located at the N-terminus of the protein. One of these 
proteins, tumorous imaginal discs (Tid58) from Drosophila virilis (Fig. 23) functions as a 
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tumour suppressor. 

t 

EXAMPLE 24 

5 A comparison of homology between MCG18 and human DnaJ proteins HDJ-2/H5DJ, HDJ- 
1/HSP40 and HSJ1 is shown in Fig. 24. 

EXAMPLE 25 

10 During the sequence characterisation of the VRF/VEGFB promoter region on cosmid CLGW4 
[Grimmond et al, 1996], which maps to chromosome 1 lql3 the inventors identified a sequence 
that exactly matched numerous human and mouse expressed sequence tags (ESTs) in the EST 
database from a gene which we designated mcgl8. EST clones for human (GenBank accession 
number T69741, clone 108172; accession number H40901, clone 177008) and mouse mcg!8 

15 (accession number W34884, clone 350966; accession number W64183, clone 385535) were 
obtained from Genome Systems Inc. and sequenced with the gene-specific primers shown in 
Table 7. The EST clones listed in Table 8 were also utilised in generating the full-length coding 
sequence for human (Figure 19) and mouse (Figure 25) meg 18. The EST database also 
contained meg 18 cDNA entries that were alternately (or partially) spliced, and in order to 

20 understand their ability to encode new polypeptides, the gene structure of mcgl8 was determined 
by sequencing human and mouse genomic templates with gene-specific primers. 

Genomic fragments containing the human [Grimmond et al, 1996] and murine genes [Townson 
et al, 1996] have been previously reported. Cosmid CLGW4 contains the entire human gene 

25 and A 1 2 1 contains the entire mouse gene, as determined by direct sequencing of the templates 
with the oligonucleotides listed in Table 7. Plasmids containing sub-fragments of X 121 and 
cosmid CLGW4 were prepared using plasmid purification kits (Qiagen) and sequenced as 
described previously [Grimmond et al, 1996; Townson et al, 1996] using primers designed 
against cDNA and genomic sequences. The BLAST suite of programs [Altschul et al, 1990] 

30 was used to compare the sequence data against the nucleotide and protein databases at the 
National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). The sequence 
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data were compiled using Mac Vector 4.2.1 software (IBI-Kodak). ClustalW sequence 
alignments [Thompson et al t 1994] were conducted using the Australian National Genome 
Information Service computer faculty at the University of Sydney, Australia. 

S The cDNA sequence of human meg 18 (Figure 19) was translated in all possible reading frames 
and compared to the GenBank non-redundant protein database using the program BLASTX 
[Altschul et al, 1990] and the coding region was identified on the basis of showing homology to 
the DnaJ family of proteins (Figure 20). The DnaJ domain is encoded within the longest open 
reading frame and the assigned initiation codon is preceded by an in-frame stop codon (Figure 

10 27). Similar database search results were obtained for the mouse meg 18 cDNA, and the 
alignment of human and mouse protein sequences is shown in Figure 26. MCG18 has greatest 
homology to gene products from C. elegans (Figure 21) and 5. pombe (Figure 22). Although 
it shares a similar J-domain, MCG18 does not contain other domains described for the tumour 
suppressor gene from D. virilis (Figure 23), nor is it a homologue of other reported human J- 

15 domain-containing proteins (Figure 24). 

To determine the expression pattern of mcgl8, 15//g of total cellular RNA (RNeasy Mini Kit, 
Qiagen) from various human cell lines grown in culture were electrophoresed through 1.2% 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
20 using 20 x SSC (Sambrook et al, 1986). Filters were subsequently UV-fixed and hybridised 
overnight at 65°C to a radiolabelled ( 32 P-dCTP) cDNA probe (Church and Gilbert, 1984) for 
mcgl8. After washes in 0.1 x SSC/0.1% w/v SDS for 65°C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcgl8 is expressed as a 1.4kb 
message in numerous tissues including breast, ovary, bladder, lung and keratinocytes (Figure 28). 
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TABLE 4 
ESTs matching mcg4 



accession number 
gb| AA3 99110 |AA399110 
gb|N39612|N39612 
gb|AA514406|AA514406 
gb|AA544946|AA544946 
gb j AA4 5007 6 j AA4 50076 
gb|AA53573l|AA535731 
gb|fcT79710|W797l0 
gb|AA503531|AA503531 



gb AA450132jAA450132 
gb AA398068|AA398068 
gb W60405|W60405 
gb W813B2|W81382 
gb AA047617|AA047617 
gb AA282175|AA28217S 
gb AA242159|AA242159 
gb AAO6868O|AA06B68O 
gb W46766|W46766 
gb N93704|N93704 
gb|AA15521O|AA155210 
gb j AA366022 (AA366022 
gb j AA037691 j AA037691 
gb|W35374|W35374 
dbj|C00696|C00696 
gb|T98249|T98249 
gb|W21588|w21588 
gb|H32171|H3217l 
gb|AA108092 AA108092 
gb|AA017857 AA017857 
gb|AA037690 AA037690 
gb|AA531006 AA531006 
gb|N46760|N46760 
gb|W23584|W23584 
gb|W42214|W42214 
gb|AA244877|AA244877 
gb|W32939|W32939 



seq. run organism 

zt89e06.sl Soares testis NHT Homo sa. 
yy51g06.sl Homo sapiens cDNA clone 2. 
nf57d01.sl NCI_CGAP_Co3 Homo sapiens. 
vk38e02.rl Soares mouse mammary glan. 
zx42a04.sl Soares total fetus Nb2HF8 . 
nf88f07.sl NCI_CGAP_Co3 Homo sapiens. 
zd86f01.rl Soares fetal heart NbHH19. 
ne47e08.sl NCI_CGAP_Co3 Homo sapiens. 
zx42a04.rl Soares total fetus Nb2HF8. 
zt89f06.rl Soares testis NHT Homo sa. 
zd29h08.rl Soares fetal heart NbHH19. 
zd86f0i.sl Soares fetal heart NbHH19. 
zfl3f07.sl Soares fetal heart NbKH19. 
Zt02d03.sl NCI_CGAP_GCBl Homo sapien. 
my30d04.rl Bars t cad mouse pooled org. 
mm61a05.rl Stratagene mouse embryoni. 
zc36b07.sl Soares senescent fibrobla. 
zb51c04.sl Soares fetal lung NbHL19W. 
mr98e01.rl Stratagene mouse embryoni. , 
EST76915 Pineal gland II Homo sapien. . 
zk34hl2.sl Soares pregnant uterus Nb. . 
zcO7h03.sl Soares parathyroid tumor 
HUMGS000B251, Human Gene Signature, . . 
yeS9a07.sl Homo sapiens cDNA clone 1.. 
zb51c04.rl Soares fetal lung NbHL19W. . 
EST107015 Rattus sp. cDNA 5' end. 
mm89e06.rl Stratagene mouse embryoni.. 
mh44dl0.rl Soares mouse placenta 4Nb. . 
zk34hl2.rl Soares pregnant uterus Nb. . 
nj07bll.sl NCI_CGAP_Pr22 Homo sapien.. 
yySlgOG.rl Homo sapiens cDNA clone 2.. 
zc71d03.sl Soares fetal heart NbHH19. . 
mc69h09.rl Soares mouse embryo NbMEl.. 
mx25a04.rl Soares mouse NML Mus muse. 
zc07h03 . rl Soares parathyroid tumor . . 



score E value 


N 


1136 


4.0e-168 


2 


1521 


5.3e-168 


4 


931 


5.5e-166 


3 


1207 


8.4e-164 


2 


691 


2.3e-160 


4 


796 


3.5e-158 


4 


1644 


l.le-157 


4 


736 


4.0e-156 


4 


1955 


3.9e-155 


1 


1315 


5.4e-148 


2 


1022 


1.8e-139 


4 


605 


3.5e-125 


5 


922 


4.6e-12S 


2 


1577 


2.0e-123 


1 


866 


7.7e-117 


2 


1280 


1.6e-98 


1 


S06 


9.6e-92 


3 


584 


9.0e-91 


4 


840 


7.6e-87 


2 


1077 


2.4e-81 


1 


949 


2.1e-80 


2 


1016 


3.1e-76 


1 


1009 


1.2e-7S 


1 


998 


6.7e-75 


1 


484 


l.le-69 


4 


828 


l.le-60 


1 


782 


1.3e-60 


2 


665 


2.5e-60 


2 


540 


9.4e-S3 


2 


535 


S.4e-48 


2 


665 


9.5e-47 


1 


457 


1.8e-44 


2 


460 


1.3e-38 


3 


429 


2.9e-25 


1 


320 


4.8e-18 


1 
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T ABLE 5 

ESTs matching AA074703 (mc£4-related cDNA) 



Database: Non-redundant Database of GenBank EST Division 
1,222.625 sequences; 449.352,662 total letters. 

Smallest 
Sum 

High Probability 



Sequences producing I 


High-scoring 


r Segment Pairs: 


Score 


P 


(N) 




N 


accession number 


seq. run 


organism 


score E 


value 


N 


gb| AA074703 | AA074703 


zm76g07 


.rl 


Stratagene neuroepitheli . . . 


2071 


4 


-0e 


-167 


1 


gb|AA0686 80|AA068680 


mm61a05 


.rl 


Stratagene mouse embryon. . . 


1270 


4 


.4e 


-145 


4 


gb|AA134788|AA134788 


zm81g02 


.rl 


Stratagene neuroepitheli. . . 


946 


1 


.3e 


-144 


5 


gb|AA399110|AA399110 


zt89e06 


.si 


Soares testis NHT Homo s 


520 


8 


.7e 


-119 


6 


gb|N39612|N39612 


yy51g06 


.si 


Homo sapiens cDNA clone . . . 


582 


9 


.6e- 


-110 


7 


gb|AA282175|AA282175 


zt02d03 


.si 


NCI_0GAP_GCB1 Homo sapie. . . 


771 


9 


.4e- 


-80 


3 


gb|W81382 |W813B2 


zd86f01 


.si 


Soares fetal heart MbHHl... 


329 


1 


6e- 


75 


6 


gb|AA544946|AA544946 


vk38e02 


.rl 


Soares mouse mammary gla. . . 


644 


9. 


6e- 


63 


2 


gb|W35374|W35374 


zc07h03 . 


.si 


Soares parathyroid tumor. . . 


294 


4. 


5e- 


42 


4 


gb|w57106|W57106 


md57cl2. 


rl 


Soares mouse embryo NbME 


394 


1. 


9e- 


30 


2 


gb|AA244877|AA244877 


mx2Sa04 . 


rl 


Soares mouse NML Mus mus. . . 


162 


2. 


le- 


27 


4 


gb|AA017857 |AA017857 


mh44dl0. 


rl 


Soares mouse placenta 4N. . . 


230 


3. 


7e- 


23 


3 


gb| AA5310O6 | AA531006 


nj07bll. 


si 


NCI_CGAP_Pr22 Homo sapie... 


139 


2. 


3e- 


19 


3 


gb|H3217l|H32171 


EST10701S Rattus sp. cDNA 5* end. 


207 


2. 


6e- 


10 


2 


gb|*iT79710|W79710 


zd86f01. 


rl 


Soares fetal heart KbHHl . . . 


157 


0. 


0073 


1 
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TABLE 6 

i 

J7ic£7-specific oligonucleotides 



name 


sequence (5' to 3') 


SEQ ID NOs. 


M1044R 


GGA CAA AGT GTG TGA TGA ACC 


SEQ ID NO:25 


MCG7-GEF-REV2 


CTC ATC CTC CGT CTG ATA CTG 


SEQ ID NO:26 


M7R 


GTA GAT GTG GAT CAG CTT GG 


SEQ ID NO:27 


MCG7 CA FOR 


AGG TGG AGA ATG GTC AAGG 


SEQ ID NO:28 


MCG7-GEF-REV 


GTC ATA GTC TGT CTC CTA CT 


SEQ ID NO:29 


MCG7 GEF FOR 


ACA TAG ACA GCG TGC CTA CC 


SEQIDNO:30 


MCG7-PKC-REV 


TAC AAC CTT AGG GAC ACC AG 


SEQIDNO:31 


MCG7-PKC-FOR 


TGC TGA GCC TGC TCA CGG TG 


SEQ ID NO:32 


T09103F 


CAA GTG AAC AGC ACG TCC 


SEQ ID NO:33 


M7F 


GAC TAT CTC AAG GAC CAG CTG 


SEQ ID NO:34 


MCG7UF 


GGT TCG GTC CGA GCC CGG 


SEQ ID NO:35 


SGCADRV2 


GGA GCG ATA CTC CAA GTA GGT 


SEQ ID NO:36 



i 



! 
t 
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TABLE 7 

mc£l«-SPECIFIC OLIGONUCLEOTIDES 



name 


sequence 5' to 3' 


5 HVESTF 


AGC GGG CCA GGC CCC TTC [SEQ ID NO:37] 


HV195F 


CAT CCT GGT CCA ATG CGC TC [SEQ ID NO:38] 


HV387F2 


GCA CTG AGG AAG TTA AAC GAG C [SEQ ID NO:39] 


HV408R 


GCT CGT TTA ACT TCC TCA GTG C [SEQ ID NO:40] 


EXON1REV 


GCT CAG CTC CAC AAA GCG GCT [SEQ ID NO:41] 


10 HVEST426F 


ACC AGC TCC GCT CAG GTA G [SEQ ID NO:42] 


HVEST623R 


TCC AGG AGC TGT GTG TTT GG [SEQ ID NO:43] 


SGVESTF3 


CCA GTT TCA CAG CGT GAG G [SEQ ID NO:44] 


HVEST631R 


CAG CAT GAG GAG GAG GCA G [SEQ ID NO:45] 
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TABLE 8 

EST CLONE SEQUENCES USED TO GENERATE HUMAN AND MOUSE 
mcgl8 cDNA SEQUENCE COMPOSITES 



EST rlnnp nnmfv>r 


- organism 


qepBank accession number 




human 


D45683 


0OI-T7-1R 

\J\J 1-1 I o 


human 


F 17225 


273748 


nam an 


N37043 


17700R 

i / / \J\JO 


human 


H40901 and H40939 


258011 


human 


N30776 


276887 • 


human 


N44004 


108172 


human 


T6974I 


307529 


human 


W21083 andW32579 


342027 


human 


W60283 


354288 


mouse 


W44038 


350966 


mouse 


W348844 


426261 


mouse 


AA002868 


368185 


mouse 


W53911 


385535 


mouse 


W64183 


404472 


mouse 


W82959 


406437 


mouse 


W83482 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: (OTHER THAN US): The Council of The Queensland Institute of 

Medical Research 

(US ONLY): HAYWARD Nicholas, SDLINS Ginters, GRIMMOND Sean, 
G ARTS IDE Michael and HANCOCK, John 

(ii) TITLE OF INVENTION A NOVEL GENE AND USES THEREFOR 

(iii) NUMBER OF SEQUENCES: 45 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DA VIES COLLISON CAVE 

(B) STREET: 1 LITTLE COLLINS STREET 

(C) CITY: MELBOURNE 

(D) STATE: VICTORIA 

(E) COUNTRY: AUSTRALIA 

(F) ZIP: 3000 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT INTERNATIONAL 

(B) FILING DATE: 22-MAY-1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: P06973 

(B) FILING DATE: 23-MAY-1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: P06974 

(B) FILING DATE: 23-MAY-1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: P06972 

(B) FILING DATE: 23-MAY-1997 
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(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 1 

(A) APPLICATION NUMBER: PP1459 

(B) FILING DATE: 22-JAN-1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1460 

(B) FILING DATE: 22-JAN-1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1458 

(B) FILING DATE: 22-JAN-1998 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 
(A) NAME: HUGHES, DR E JOHN L 
(C) REFERENCE/DOCKET NUMBER: FJH/AF 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: +61 3 9254 2777 

(B) TELEFAX: +61 3 9254 2770 

(C) TELEX: AA 31787 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Cys Xaa Xaa Cys Xaa Gly Xaa Gly 
5 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 30.. 959 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

TCAGTAAACA CAGAGACTGG GGATCGATC ATG GGG CTT TGT AAG TGC CCC AAG 

Met Gly Leu Cys Lys Cys Pro Lys 
1 5 

AGA AAG GTG ACC AAC CTG TTC TGC TTC GAA CAT CGG GTC AAC GTC TGC 1 
Arg Lys Val Thr Asn Leu Phe Cys Phe Glu His Arg Val Asn Val Cys 
10 15 20 

GAG CAC TGC CTG GTA GCC AAT CAC GCC AAG TGC ATC GTC CAG TCC TAC 1 
Glu His Cys Leu Val Ala Asn His Ala Lys Cys lie Val Gin Ser Tyr 
25 30 35 40 

CTG CAA TGG CTC CAA GAT AGC GAC TAC AAC CCC AAT TGC CGC CTG TGC 1 
Leu Gin Trp Leu Gin Asp Ser Asp Tyr Asn Pro Asn Cys Arg Leu Cys 
45 50 55 

AAC ATA CCC CTG GCC AGC CGA GAG ACG ACC CGC CTT GTC TGC TAT GAT 2 
Asn He Pro Leu Ala Ser Arg Glu Thr Thr Arg Leu Val Cys Tyr Asp 
60 65 70 

CTC TTT CAC TGG GCC TGC CTC AAT GAA CGT GCT GCC CAG CTA CCC CGA 2 
Leu Phe His Trp Ala Cys Leu Asn Glu Arg Ala Ala Gin Leu Pro Arg 
75 80 85 

AAC ACG GCA CCT GCC GGC TAT CAG TGC CCC AGC TGC AAT GGC CCC ATC 3 
Asn Thr Ala Pro Ala Gly Tyr Gin Cys Pro Ser Cys Asn Gly Pro He 
90 95 100 

TTC CCC CCA ACC AAC CTG GCT GGC CCC GTG GCC TCC GCA CTG AGA GAG 3 
Phe Pro Pro Thr Asn Leu Ala Gly Pro Val Ala Ser Ala Leu Arg Glu 
105 110 115 120 



WO 98/53061 



-60- 



PCT/AU98/00380 



AAG CTG GCC ACA GTC AAC TGG GCC CGG GCA GGA CTG GGC CTC CCT CTG 437 

Lys Leu Ala Thr Val Asn Trp Ala Arg Ala Gly Leu Gly Leu Pro Leu 

125 130 135 

t 

ATC GAT GAG GTG GTG AGC CCA GAG CCC GAG CCC CTC AAC ACG TCT GAC 485 

lie Asp Glu Val Val Ser Pro Glu Pro Glu Pro Leu Asn Thr Ser Asp 
140 145 150 

TTC TCT GAC TGG TCT AGT TTT AAT GCC AGC AGT ACC CCT GGA CCA GAG 533 
Phe Ser Asp Trp Ser Ser Phe Asn Ala Ser Ser Thr Pro Gly Pro Glu 
155 160 165 

GAG GTA GAC AGC GCC TCT GCT GCC CCA GCC TTC TAC AGC CGA GCC CCC 581 
Glu Val Asp Ser Ala Ser Ala Ala Pro Ala Phe Tyr Ser Arg Ala Pro 
170 175 180 

CGG CCC CCA GCT TCC CCA GGC CGG CCC GAG CAG CAC ACA GTG ATC CAC 629 
Arg Pro Pro Ala Ser Pro Gly Arg Pro Glu Gin His Thr Val lie His 
185 190 195 200 

ATG GGC AAT CCT GAG CCC TTG ACT CAC GCC CCT AGG AAG GTG TAT GAT 677 
Met Gly Asn Pro Glu Pro Leu Thr His Ala Pro Arg Lys Val Tyr Asp 
205 210 215 

ACG CGG GAT GAT GAC CGG ACA CCA GGC CTC CAT GGA GAC TGT GAC GAT 725 
Thr Arg Asp Asp Asp Arg Thr Pro Gly Leu His Gly Asp Cys Asp Asp 
220 225 230 

GAC AAG TAC CGA CGT CGG CCG GCC TTG GGT TGG CTG GCC CGG CTG CTA 773 
Asp Lys Tyr Arg Arg Arg Pro Ala Leu Gly Trp Leu Ala Arg Leu Leu 
235 240 245 

AGG AGC CGG GCT GGG TCT CGG AAG CGG CCG CTG ACC CTG CTC CAG CGG 821 
Arg Ser Arg Ala Gly Ser Arg Lys Arg Pro Leu Thr Leu Leu Gin Arg 
250 255 260 

GCG GGG CTG CTG CTA CTC TTG GGA CTG CTG GGC TTC CTG GCC CTC CTT 869 
Ala Gly Leu Leu Leu Leu Leu Gly Leu Leu Gly Phe Leu Ala Leu Leu 
265 270 275 280 

GCC CTC ATG TCT CGC CTA GGC CGG GCC GCA GCT GAC AGC GAT CCC AAC 917 
Ala Leu Met Ser Arg Leu Gly Arg Ala Ala Ala Asp Ser Asp Pro Asn 
285 290 295 

CTG GAC CCA CTC ATG AAC CCT CAC ATC CGC GTG GGC CCC TCC TGA 962 
Leu Asp Pro Leu Met Asn Pro His lie Arg Val Gly Pro Ser * 
300 305 310 

GCCCCCTTGC TTGTGGCTAG GCCAGCCTAG GATGTGGGTT CTGTGGAGGA GAGGCGGGGT 1022 

AATGGGGAGG CTGAGGGCAC CTCTTCACTG CCCCTCTCCC TCAAGCCTAA GACACTAAGA 1082 

CCCCAGACCC AAAGCCAAGT CCACCAGAGT GGCTCGCAGG CCAGGCCTGG AGTCCCCGTG 1142 

GGTCAAGCAT TTGTCTTGAC TTGCTTTCTC CCGGGTCTCC AGCCTCCGAC CCCTCGCCCC 1202 

ATGAAGGAGC TGGCAGGTGG AAATAAACAA CAACTTTATT 1242 



(2) INFORMATION FOR SEQ ID NO: 3; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 310 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Gly Leu Cys Lys Cys Pro Lys Arg Lys Val Thr Asn Leu Phe Cys 
1 5 10 ' 15 

Phe Glu His Arg Val Asn Val Cys Glu His Cys Leu Val Ala Asn His 
20 25 30 

Ala Lys Cys lie Val Gin Ser Tyr Leu Gin Trp Leu Gin Asp Ser Asp 
35 40 45 

Tyr Asn Pro Asn Cys Arg Leu Cys Asn lie Pro Leu Ala Ser Arg Glu 
50 55 60 

Thr Thr Arg Leu Val Cys Tyr Asp Leu Phe His Trp Ala Cys Leu Asn 
65 70 75 80 

Glu Arg Ala Ala Gin Leu Pro Arg Asn Thr Ala Pro Ala Gly Tyr Gin 
85 90 95 

Cys Pro Ser Cys Asn Gly Pro lie Phe Pro Pro Thr Asn Leu Ala Gly 
100 105 110 

Pro Val Ala Ser Ala Leu Arg Glu Lys Leu Ala Thr Val Asn Trp Ala 
115 120 125 

Arg Ala Gly Leu Gly Leu Pro Leu lie Asp Glu Val Val Ser Pro Glu 
130 135 140 

Pro Glu Pro Leu Asn Thr Ser Asp Phe Ser Asp Trp Ser Ser Phe Asn 
145 150 155 160 

Ala Ser Ser Thr Pro Gly Pro Glu Glu Val Asp Ser Ala Ser Ala Ala 
165 170 175 

Pro Ala Phe Tyr Ser Arg Ala Pro Arg Pro Pro Ala Ser Pro Gly Arg 
180 185 190 

Pro Glu Gin His Thr Val lie His Met Gly Asn Pro Glu Pro Leu Thr 
195 200 205 

His Ala Pro Arg Lys Val Tyr Asp Thr Arg Asp Asp Asp Arg Thr Pro 
210 215 220 

Gly Leu His Gly Asp Cys Asp Asp Asp Lys Tyr Arg Arg Arg Pro Ala 
225 230 235 240 

Leu Gly Trp Leu Ala Arg Leu Leu Arg Ser Arg Ala Gly Ser Arg Lys 
245 250 ~ 255 

Arg Pro Leu Thr Leu Leu Gin Arg Ala Gly Leu Leu Leu Leu Leu Gly 
260 265 270 

Leu Leu Gly Phe Leu Ala Leu Leu Ala Leu Met Ser Arg Leu Gly Arg 
275 280 285 

Ala Ala Ala Asp Ser Asp Pro Asn Leu Asp Pro Leu Met Asn Pro His 
290 295 300 

lie Arg Val Gly Pro Ser 
305 310 



(2) 



INFORMATION FOR SEQ ID NO: 4: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 2415 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 3.. 2188 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

CG ATT TCA TTC CTC GCT CCC CAC AGG TCC CTC TCC CCA AAA TAT TCC 47 
lie Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser 
15 10 15 

CAT CTT GTC CTA GCC CAT CCC CCA GAC TAT CTC AAG GAC CAG CTG TCC 95 
His Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser 
20 25 30 

CCA CGC CCC CGA CCT CCA CTA GGC CTG TGC CAC CCG CTG CCT GCA GGA 143 
Pro Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly 
35 40 45 

AGA CGC CCG GTC CCG GGC CGG GTT AGC CCC ATG GGA ACG CAG CGC CTG 191 
Arg Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu 
50 55 60 

TGT GGC CGC GGG ACT CAA GGC TGG CCT GGC TCA AGT GAA CAG CAC GTC 239 
Cys Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val 
65 70 75 

CAG GAG GCG ACC TCG TCC GCG GGT TTG CAT TCT GGG GTG GAC GAG CTG 287 
Gin Glu Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu 
80 85 90 95 

GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC CTG GGC 335 
Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly 
100 105 110 

CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC CTG GAC 3 83 

Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp 
115 120 125 

AAG GGC TGC ACG GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC 431 
Lys Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys lie Glu Ala Phe 
130 135 140 

GAT GAC TCC GGG AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC 479 
Asp Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu 
145 150 155 

ATG ATG CAC CCC TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG 527 
Met Met His Pro Trp Tyr lie Pro Ser Ser Gin Leu Ala Ala Lys Leu 
160 165 170 175 

CTC CAC ATC TAC CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG 575 
Leu His lie Tyr Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin 
180 185 190 

GTG AAA ACG TGC CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG 623 
Val Lys Thr Cys His Leu Val Arg Tyr Trp lie Ser Ala Phe Pro Ala 
195 200 205 
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GAG TTT GAC TTG AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG CTG AAG 671 
Glu Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin lie Lys Glu Leu Lys 
210 215 | 220 

GCT CTG CTA GAC CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC 719 
Ala Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu lie Asp 
225 230 235 

ATA GAC AGC GTC CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG 767 
lie Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg 
240 245 250 255 

AAC CCT GTG GGA CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC 815 
Asn Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His 
260 265 . 270 

CTG GAG CCC ATG GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC 863 
Leu Glu Pro Met Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg 
275 280 285 

TCC TTC TGC AAG ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT 911 
Ser Phe Cys Lys He Leu Phe Gin Asp Tyr His Ser Phe Val Thr His 
290 295 300 

GGC TGC ACT GTG GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC 959 
Gly Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe He Ser Leu Phe 
305 310 315 

AAC AGC GTC TCA CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC ACA 1007 
Asn Ser Val Ser Gin Trp Val Gin Leu Met He Leu Ser Lys Pro Thr 
320 325 330 335 

GCC CCG CAG CGG GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG 1055 
Ala Pro Gin Arg Ala Leu Val He Thr His Phe Val His Val Ala Glu 
340 345 350 

AAG CTG CTA CAG CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG 1103 
Lys Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly 
355 360 365 

GGC CTG AGC CAC AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC 1151 
Gly Leu Ser His Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His 
370 375 380 

GTT AGC CCT GAG ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG 1199 
Val Ser Pro Glu Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val 
385 390 395 

ACG GCG ACA GGC AAC TAT GGC AAC TAC CGG CGT CGG CTG GCA GCC TGT 1247 
Thr Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys 
400 405 410 ~ 415 

GTG GGC TTC CGC TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG 1295 
Val Gly Phe Arg Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val 
420 425 " 430 

GCC CTG CAG CTG GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG 1343 
Ala Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg 
435 440 445 

CTC AAC GGG GCC AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG 1391 
Leu Asn Gly Ala Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu 
450 455 460 

GCC ATG GTG ACC AGC CTG CGG CCA CCA GTA CAG GCC AAC CCC GAC CTG 1439 
Ala Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu 
465 470 475 
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CTG AGC CTG CTC ACG GTG TCT CTG GAT CAG TAT CAG ACG GAG GAT GAG 1487 
Leu Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu 
480 485 490 495 

I 

CTG TAC CAG CTG TCC CTG CAG CGG GAG CCG CGC TCC AAG TCC TCG CCA 1535 
Leu Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro 
500 505 510 

ACC AGC CCC ACG AGT TGC ACC CCA CCA CCC CGG CCC CCG GTA CTG GAG 1583 
Thr Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu 
515 520 525 

GAG TGG ACC TCG GCT GCC AAA CCC AAG CTG GAT CAG GCC CTC GTG GTG 1631 
Glu Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val 
530 535 540 

GAG CAC ATC GAG AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC 1679 
Glu His lie Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val 
545 550 555 

GAT GGG GAT GGC CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG 1727 
Asp Gly Asp Gly His lie Ser Gin Glu Glu Phe Gin lie lie Arg Gly 
560 565 570 575 

AAC TTC CCT TAC CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT 1775 
Asn Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp 
580 585 590 

GGC TGC ATC AGC AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC 1823 
Gly Cys lie Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser 
595 600 605 

TCT GTG TTG GGG GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC 1871 
Ser Val Leu Gly Gly Arg Met Gly Phe Val His Asn. Phe Gin Glu Ser 
610 615 620 

AAC TCC TTG CGC CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG 1919 
Asn Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu lie Leu 
625 630 635 

GGC ATC TAC AAG CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC 1967 
Gly lie Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys 
640 645 650 655 

CAC AAG CAG TGC AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC 2 015 

His Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala 
660 665 670 

CAG AGT GTG AGC CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC 2063 
Gin Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His 
675 680 685 

AGC CAC CAT CAC CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG 2111 
Ser His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg 
690 695 700 

CGA GGC TCC AGG CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG 2159 
Arg Gly Ser Arg Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val 
705 710 715 

GAG GAT GGG GTG TTT GAC ATC CAC TTG TA ATAGATGCTG TGGTTGGATC 2208 
Glu Asp Gly Val Phe Asp He His Leu 
720 725 

AAGGACTCAT TCCTGCCTTG GAGAAAATAC TTCAACCAGA GCAGGGAGCC TGGGGGTGTC 22 68 

GGGGCAGGAG GCTGGGGATG GGGGTGGGAT ATGAGGGTGG CATGCAGCTG AGGGCAGGGC 232 8 
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CAGGGCTGGT GTCCCTAAGG TTGTACAGAC TCTTGTGAAT ATTTGTATTT TCCAGATGGA 23 88 
ATAAAAAGGC CCGTGTAATT AACCTTC { 2415 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 728 amino acids 

(B) TYPE; amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

lie Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser His 
1 5 10 15 

Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser Pro 
20 25 30 

Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly Arg 
35 40 45 

Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu Cys 
50 55 60 

Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val Gin 
65 70 75 80 

Glu Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu Gly 
85 90 95 

Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly Pro 
100 105 110 

Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp Lys 
115 120 125 

Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys He Glu Ala Phe Asp 
130 135 140 

Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met 
145 150 155 160 

Met His Pro Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu 
165 170 175 

His He Tyr Gin Gin Ser Arg. Lys Asp Asn Ser Asn Ser Leu Gin Val 
180 185 190 

Lys Thr Cys His Leu Val Arg Tyr Trp He Ser Ala Phe Pro Ala Glu 
195 200 205 

Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin He Lys Glu Leu Lys Ala 
210 215 220 

Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu He Asp He 
225 230 235 240 

Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn 
245 250 255 

Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu 
260 265 270 
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Glu Pro Met Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser 
275 280 285 

Phe Cys Lys lie Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly 
290 295 300 

Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe lie Ser Leu Phe Asn 
305 310 315 320 

Ser Val Ser Gin Trp Val Gin Leu Met lie Leu Ser Lys Pro Thr Ala 
325 330 335 

Pro Gin Arg Ala Leu Val lie Thr His Phe Val His Val Ala Glu Lys 
340 345 350 

Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly 
355 360 365 

Leu Ser His Ser Ser lie Ser Arg Leu Lys Glu Thr His Ser His Val 
370 375 380 

Ser Pro Glu Thr lie Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr 
385 390 395 400 

Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val 
405 410 415 

Gly Phe Arg Phe Pro lie Leu Gly Val His Leu Lys Asp Leu Val Ala 
420 425 430 

Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu 
435 440 445 

Asn Gly Ala Lys Met Lys Gin Leu Phe Ser lie Leu Glu Glu Leu Ala 
450 455 460 

Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu Leu 
465 470 475 480 

Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu Leu 
485 490 495 

Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro Thr 
500 505 510 

Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu Glu 
515 520 525 

Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu 
530 535 540 

His lie Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp 
545 550 555 560 

Gly Asp Gly His lie Ser Gin Glu Glu Phe Gin lie lie Arg Gly Asn 
565 570 575 



Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly 
580 585 590 

Cys lie Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser 
595 600 605 

Val Leu Gly Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser Asn 
610 615 620 

Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu lie Leu Gly 
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625 630 635 640 

He Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His 
645 650 655 

Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala Gin 
660 665 670 

Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser 
675 680 685 

His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg 
690 695 700 

Gly Ser Arg Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val Glu 
705 710 715 720 

Asp Gly Val Phe Asp He His Leu 
725 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2309 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 254.. 2083 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 6 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 12 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAGC CCCATGGGAA 18 

CGGGGTTCGG TCCGAGCCCG GTGGGAGGCT CCCGGAGCGC AGCCTGGGCC CAGCCCACCC 24 

CGCGCCGGCG GCC ATG GCA GGC ACC CTG GAC CTG GAC AAG GGC TGC ACG 28 
Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr 
1 5 10 

GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC GAT GAC TCC GGG 33 
Val Glu Glu Leu Leu Arg Gly Cys He Glu Ala Phe Asp Asp Ser Gly 
15 20 25 

AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC ATG ATG CAC CCC 38 
Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro 
30 35 40 

TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG CTC CAC ATC TAC 43 
Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu His He Tvr 
45 50 55 60 

CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG GTG AAA ACG TGC 48 
Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys 
65 70 75 

CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG GAG TTT GAC TTG 52 
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His Leu Val Arg Tyr Trp lie Ser Ala Phe Pro Ala Glu Phe Asp Leu 
80 85 90 

AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG <£TG AAG GCT CTG CTA GAC 577 
Asn Pro Glu Leu Ala Glu Gin lie Lys Glu Leu Lys Ala Leu Leu Asp 
95 100 105 

CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC ATA GAC AGC GTC 625 
Gin Glu Gly Asn Arg Arg His Ser Ser Leu lie Asp lie Asp Ser Val 
110 115 120 

CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG AAC CCT GTG GGA 673 
Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly 
125 130 135 140 

CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC CTG GAG CCC ATG 721 
"M Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met 

V: 145 150 155 

GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC TCC TTC TGC AAG 769 
Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys 
160 165 170 

ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT GGC TGC ACT GTG 817 

lie Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val 

175 180 185 

'i 

I GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC AAC AGC GTC TCA 865 

« Asp Asn Pro Val Leu Glu Arg Phe He Ser Leu Phe Asn Ser Val Ser 

! 190 195 200 

CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC ACA GCC CCG CAG CGG 913 
Gin Trp Val Gin Leu Met He Leu Ser Lys Pro Thr Ala Pro Gin Arg 
205 210 215 220 

GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG AAG CTG CTA CAG 961 
Ala Leu Val He Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin 
225 230 235 

CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG GGC CTG AGC CAC 1009 
Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His 
; 240 245 250 

AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC GTT AGC CCT GAG 1057 
Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu 
255 260 265 

ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG ACG GCG ACA GGC 1105 
Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly 
270 275 280 

i AAC TAT GGC AAC TAC CGG CGT CGG CTG GCA GCC TGT GTG GGC TTC CGC 1153 

I Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg 

: ' 285 290 295 300 

TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG GCC CTG CAG CTG 1201 
Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu 
305 310 315 

GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG CTC AAC GGG GCC 1249 
Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala 
320 325 330 

;V: AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG GCC ATG GTG ACC 1297 

Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu Ala Met Val Thr 
335 340 345 
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AGC CTG CGG CCA CCA GTA CAG GCC AAC CCC GAC CTG CTG AGC CTG CTC 1345 
Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu Leu Ser Leu Leu 
350 355 t 360 

ACG GTG TCT CTG GAT CAG TAT CAG ACG GAG GAT GAG CTG TAC CAG CTG 1393 
Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu Leu Tyr Gin Leu 
365 370 375 380 

TCC CTG CAG CGG GAG CCG CGC TCC AAG TCC TCG CCA ACC AGC CCC ACG 1441 
Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro Thr Ser Pro Thr 
385 390 395 

AGT TGC ACC CCA CCA CCC CGG CCC CCG GTA CTG GAG GAG TGG ACC TCG 1489 
Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu Glu Trp Thr Ser 
400 405 410 

GCT GCC AAA CCC AAG CTG GAT CAG GCC CTC GTG GTG GAG CAC ATC GAG 1537 
Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu His lie Glu 
415 420 425 

AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC GAT GGG GAT GGC 1585 
Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly 
430 435 440 

CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG AAC TTC CCT TAC 1633 
His lie Ser Gin Glu Glu Phe Gin He He Arg Gly Asn Phe Pro Tyr 
445 450 455 460 

CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT GGC TGC ATC AGC 1681 
Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys He Ser 
465 470 475 

AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC TCT GTG TTG GGG 1729 
Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly 
480 485 490 

GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC AAC TCC TTG CGC 1777 
Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser Asn Ser Leu Arg 
495 500 505 

CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG GGC ATC TAC AAG 1825 
Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu Gly He Tyr Lys 
510 515 520 

CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC CAC AAG CAG TGC 1873 
Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys 
525 530 535 540 

AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC CAG AGT GTG AGC 1921 
Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser 
545 550 555 

CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC AGC CAC CAT CAC 1969 
Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser His His His 
560 565 570 

CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG CGA GGC TCC AGG 2017 
Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg 
575 580 585 

CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG GAG GAT GGG GTG 2065 
Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val 
590 595 600 

TTT GAC ATC CAC TTG TAATAGATGC TGTGGTTGGA TCAAGGACTC ATTCCTGCCT 2120 
Phe Asp He His Leu 
605 610 
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TGGAGAAAAT ACTTCAACCA GAGCAGGGAG CCTGGGGGTG TCGGGGCAGG AGGCTGGGGA 2180 
TGGGGGTGGG ATATGAGGGT GGCATGCAGC TGAGGGCAGG GCCAGGGCTG GTGTCCCTAA 2240 



GGTTGTACAG ACTCTTGTGA ATATTTGTAT TTTCCAGATG GAATAAAAAG GCCCGTGTAA 2300 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 609 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
15 10 15 

Leu Arg Gly Cys lie Glu Ala Phe Asp Asp Ser Gly Lys Val Arg Asp 
20 25 30 

Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro Trp Tyr He Pro 
35 40 45 

Ser Ser Gin Leu Ala Ala Lys Leu Leu His He Tyr Gin Gin Ser Arg 
50 55 60 

Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys His Leu Val Arg 
65 70 75 80 

Tyr Trp He Ser Ala Phe Pro Ala Glu Phe Asp Leu Asn Pro Glu Leu 
85 90 95 

Ala Glu Gin He Lys Glu Leu Lys Ala Leu Leu Asp Gin Glu Gly Asn 
100 105 110 

Arg Arg His Ser Ser Leu He Asp He Asp Ser Val Pro Thr Tyr Lys 
115 120 125 

Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly Gin Lys Lys Arg 
130 135 140 

Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met Glu Leu Ala Glu 
145 150 155 160 

His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys lie Leu Phe Gin 
165 170 175 

Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val Asp Asn Pro Val 
180 185 190 

Leu Glu Arg Phe lie Ser Leu Phe Asn Ser Val Ser Gin Trp Val Gin 
195 200 205 

Leu Met He Leu Ser Lys Pro Thr Ala Pro Gin Arg Ala Leu Val He 
210 215 220 

Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe 
225 230 235 240 

Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser He Ser 



TTAACCTTC 



2309 



245 



250 



255 
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Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu Thr lie Lys Leu 
260 265 270 

t 

Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly Asn Tyr Gly Asn 
275 280 285 

Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg Phe Pro lie Leu 
290 295 300 

Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu Ala Leu Pro Asp 
305 310 315 320 

Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala Lys Met Lys Gin 
325 330 335 

Leu Phe Ser lie Leu Glu Glu Leu Ala Met Val Thr Ser Leu Arg Pro 
340 345 350 

Pro Val Gin Ala Asn Pro Asp Leu Leu Ser Leu Leu Thr Val Ser Leu 
355 360 365 

Asp Gin Tyr Gin Thr Glu Asp Glu Leu Tyr Gin Leu Ser Leu Gin Arg 
370 375 380 

Glu Pro Arg Ser Lys Ser Ser Pro Thr Ser Pro Thr Ser Cys Thr Pro 
385 390 395 400 

Pro Pro Arg Pro Pro Val Leu Glu Glu Trp Thr Ser Ala Ala Lys Pro 
405 410 415 

Lys Leu Asp Gin Ala Leu Val Val Glu His lie Glu Lys Met Val Glu 
420 425 430 

Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly His lie Ser Gin 
435 440 445 

Glu Glu Phe Gin lie lie Arg Gly Asn Phe Pro Tyr Leu Ser Ala Phe 
450 455 460 

Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys lie Ser Arg Glu Glu Met 
465 470 475 480 

Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly Gly Arg Met Gly 
485 490 495 

Phe Val His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys 
500 505 510 

Arg His Cys Lys Ala Leu lie Leu Gly lie Tyr Lys Gin Gly Leu Lys 
515 520 525 

Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu 
530 535 540 

Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser Leu Glu Gly Ser 
545 550 555 560 

Ala Pro Ser Pro Ser Pro Met His Ser His His His Arg Ala Phe Ser 
565 570 575 

Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg Pro Pro Glu lie 
580 585 590 

Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val Phe Asp He His 
595 600 605 



Leu 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 832 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



<ix) FEATURE: 

(A) NAME /KEY : CDS 

{B) LOCATION: 11.. 733 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GCCCGCCGCC ATG CCG CCC TTA CTG CCC CTG CGC CTG TGC CGG CTG TGG 49 
Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp 
1 5 10 

CCC CGC AAC CCT CCC TCC CGG CTC CTC GGA GCG GCC GCC GGG CAG CGG 97 
Pro Arg Asn Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg 
15 20 25 

TCC AGA CCC AGT ACT TAT TAT GAA CTG TTG GGG GTG CAT CCT GGT GCC 145 
Ser Arg Pro Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala 
30 35 40 45 

AGC ACT GAG GAA GTT AAA CGA GCT TTC TTC TCC AAG TCC AAA GAG CTG 193 
Ser Thr Glu Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu 
50 55 60 

CAC CCA GAC CGG GAC CCT GGG AAC CCA AGC CTG CAC AGC CGC TTT GTG 241 
His Pro Asp Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val 
65 70 75 

GAG CTG AGC GAG GCA TAC CGT GTG CTC AGC CGT GAG CAG AGC CGC CGC 289 
Glu Leu Ser Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg 
80 85 90 

AGC TAT GAT GAC CAG CTC CGC TCA GGT AGT CCC CCA AAG TCT CCA CGA 337 
Ser Tyr Asp Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg 
95 100 105 

ACC ACA GTC CAT GAC AAG TCT GCC CAC CAA ACA CAC AGC TCC TGG ACA 385 
Thr Thr Val His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr 
110 115 120 125 

CCC CCC AAC GCA CAG TAC TGG TCC CAG TTT CAC AGC GTG AGG CCA CAG 433 
Pro Pro Asn Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin 
130 135 140 

GGG CCC CAG TTG AGG CAG CAG CAA CAC AAA CAA AAC AAA CAA GTG CTG 481 
Gly Pro Gin Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu 
145 150 155 

GGG TAC TGC CTC CTC CTC ATG CTG GCG GGC ATG GGC CTG CAC TAC ATT 529 
Gly Tyr Cys Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr lie 
160 165 170 



GCC TTC AGG AAG GTG AAG CAG ATG CAC CTT AAC TTC ATG GAT GAA AAG 
Ala Phe Arg Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys 
175 180 185 



577 
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GAT CGG ATC ATC AC A GCC TTC TAC AAC GAA GCC CGG GCA CGG GCC AGG 625 
Asp Arg lie lie Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg 
190 195 2p0 205 

GCC AAC AGA GGC ATC CTT CAG CAG GAG CGA CAA CGG CTA GGG CAG CGG 673 
Ala Asn Arg Gly lie Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg 
210 215 220 

CAG CCG CCA CCA TCC GAG CCA ACC CAA GGC CCC GAG ATC GTG CCC CGG 721 
Gin Pro Pro Pro Ser Glu Pro Thr Gin Gly Pro Glu lie Val Pro Arg 
225 230 235 

GGC GCC GGC CCC TGA GGGGCTC ACCTGGATGG GGCCTGCAGT GCGTTCCCGC 773 
Gly Ala Gly Pro * 
240 

TTTGCTTCCT TCCCTGGACG GCCCGCTCCC CGAAACGCGC GCAATAAAGT GATTCGCAG 832 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 241 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp Pro Arg Asn 
15 10 15 

Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg Ser Arg Pro 
20 25 30 

Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala Ser Thr Glu 
35 40 45 

Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu His Pro Asp 
50 55 60 

Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val Glu Leu Ser 
65 70 75 80 

Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg Ser Tyr Asp 
85 90 95 

Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg Thr Thr Val 
100 105 110 

His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr Pro Pro Asn 
115 120 125 

Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin Gly Pro Gin 
130 135 140 

Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu Gly Tyr Cys 
145 150 155 160 

Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr lie Ala Phe Arg 
165 170 175 

Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys Asp Arg lie 
180 185 190 

lie Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg Ala Asn Arg 
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195 



200 



205 



Gly lie Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg Gin Pro Pro 
210 215 220 

Pro Ser Glu Pro Thr Gin Gly Pro Glu lie Val Pro Arg Gly Ala Gly 
225 230 235 240 



Pro 



SEQ ID Nos: 10-18 25-36 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 300 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION: 170.. 300 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 120 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAG CCC CAT 175 

Pro His 
1 

GGG AAC GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC 223 
Gly Asn Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser 
5 10 15 

CTG GGC CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC 271 
Leu Gly Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp 
20 25 30 

CTG GAC AAG GGC TGC ACG GTG GAG GAG CT 300 
Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Pro His Gly Asn Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu 
1 5 10 15 

Arg Ser Leu Gly Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr 
20 25 30 

Leu Asp Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GGGATCCCCC TGGTC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Asp Val Asp Glu Glu Asp Glu Val Glu Asp lie Glu Phe 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Asp Val Asp Gly Asp Gly His lie Ser Gin Glu Glu Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

t 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Asp His Asp Arg Asp Gly Phe lie Ser Gin Glu Glu Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Asp Gin Asn Gin Asp Gly Cys lie Ser Arg Glu Glu Met 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Asp Val Asp Met Asp Gly Gin lie Ser Lys Asp Glu Leu 
15 10 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe Asn 
15 10 15 

Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser lie Ser Arg 
20 25 30 
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Leu Lys Glu Thr His 
35 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Lys Phe Val His Val Ala Lys His Leu Arg Lys He Asn Asn Phe Asn 
15 10 15 

Thr Leu Met Ser Val Val Gly Gly He Thr His Ser Ser Val Ala Arg 
20 25 30 

Leu Ala Lys Thr Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys Arg His 
15 10 15 

Cys Lys Ala Leu He Leu Gly He Tyr Lys Gin Gly Leu Lys Cys - Arg 
20 25 30 

Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu Ser Val 
35 40 45 

Glu Cys 
50 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 18: 

His Asn Phe His Glu Thr Thr Phe Leu Thr Pro Thr Thr Cys Asn His 
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1 5 10 15 

Cys Asn Lys Leu Leu Trp Gly lie Leu Arg Gin Gly Phe Lys Cys Lys 
20 25 i 30 

Asp Cys Gly Leu Ala Val His Ser Cys Cys Lys Ser Asn Ala Val Ala 
35 40 45 

Glu Cys 
50 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGGATCCCCC TGGTC 15 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GAATTCGGCA CGAGCCGACG G 21 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 78 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ATGGAGCAGA AGCTGATCTC CGAGGAGGAC CTGCCCGGGG CAGCTGGATC CGCAGCCCAC 60 
CCCGCGCCGG CGGCCATG 78 
(2) INFORMATION FOR SEQ ID NO: 22: 
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( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Met Glu Gin Lys Leu He Ser Glu Glu Asp Leu Pro Gly Ala Ala Gly 
15 10 15 

Ser Ala Ala His Pro Ala Pro Ala Ala Met 
20 25 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
GGATCCGCAG CCCACCCCGC GCCGGCGGCC ATG 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 



Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 
5 10 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
GGACAAAGTG TGTGATGAAC C 

1 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CTCATCCTCC GTCTGATACT G 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
GTAGATGTGG ATCAGCTTGG 
(2) INFORMATION FOR SEQ ID NO: 28: . 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
AGGTGGAGAA TGGTCAAGG 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
GTCATAGTCT GTCTCCTACT 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
ACATAGACAG CGTGCCTACC 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
TACAACCTTA GGGACACCAG 
(2) INFORMATION FOR SEQ ID NO: 32: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
TGCTGAGCCT GCTCACGGTG 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
C AAGTG AAC A GCACGTCC 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
GACTATCTCA AGGACCAGCT G 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
GGTTCGGTCC GAGCCCGG 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
GGAGCGATAC TCCAAGTAGG T 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
AGCGGGCCAG GCCCCTTC 



(2) INFORMATION FOR SEQ ID NO: 38: 
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(i) . SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid I 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

Ui) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CATCCTGGTC CAATGCGCTC 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
GCACTGAGGA AGTTAAACGA GC 22 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
GCTCGTTTAA CTTCCTCAGT GC 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 



GCTCAGCTCC ACAAAGCGGC T 
(2) INFORMATION FOR SEQ ID NO: 42: 



21 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

ACCAGCTCCG CTCAGGTAG 19 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
TCCAGGAGCT GTGTGTTTGG 20 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
CCAGTTTCAC AGCGTGAGG 19 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



CAGCATGAGG AGGAGGCAG 



19 
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CLAIMS: 

i 

1 . An isolated nucleic acid molecule comprising a sequence of nucleotides encoding or 
complementary to a sequence encoding an amino acid sequence having homology to a regulator 
of gene expression or a derivative of said gene regulator. 

2. An isolated nucleic acid molecule according to claim 1 wherein the regulator 
comprises a zinc finger domain of an (HC 3 ) 2 type. 

3. An isolated nucleic acid molecule according to claim 2 wherein the sequence of 
nucleotides or complementary sequence of nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

4. An isolated nucleic acid molecule according to claim 1 wherein said gene regulator is 
a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

5. An isolated nucleic acid molecule according to claim 4 wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 or 
7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
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nucleotide sequence set forth in (i) f (ii) or (iii). 

t 

6. An isolated nucleic acid molecule according to claim 1, wherein said gene regulator 
is a heat shock protein or is a heat shock binding protein or a derivative thereof. 

7. An isolated nucleic acid molecule according to claim 6, wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

8. A genetic construct comprising a vector portion and a gene portion comprising a 
regulator of gene expression or a derivative thereof . 

9. A genetic construct according to claim 8 wherein the gene portion comprises a zinc 
finger domain of (HC 3 ) 2 type. 

10. A genetic construct according to claim 9 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 
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11. A genetic construct according to claim 8 wherein said gene portion is a nucleotide 
exchange factor (GEF) or derivative thereof. ■ 

12. A genetic construct according to claim 11 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 or 
7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

13. A genetic construct according to claim 8 wherein the gene portion is a heat shock 
protein or a derivative thereof or a heat shock binding protein or derivative thereof. 

14. A genetic construct according to claim 13 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

15. A nucleic acid molecule encoding a gene regulator having the identifying 
characteristics of a molecule selected from MCG4, MCG7 and MCG18 having respective amino 
acid sequences of SEQ ID NO:3, SEQ ID NO: 5 or 7 and SEQ ID NO:9. 
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16. A method of detecting a condition caused or facilitated by an aberration in mcg4 t said 
method comprising determining the presence of a^ single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcg4 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

17. A method of detecting a condition caused or facilitated by an aberration in mcg4 t said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG4 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

18. A method for detecting MCG4 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG4 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG4 
complex to form, and then detecting said complex. 

19. A method of detecting a condition caused or facilitated by an aberration in meg 7, said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said meg 7 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

20. A method of detecting a condition caused or facilitated by an aberration in meg 7, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG7 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

21. A method for detecting MCG7 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG7 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG7 
complex to form, and then detecting said complex. 
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22. A method of detecting a condition caused or facilitated by an aberration in mcgl8 t said 
method comprising determining the presence of a Single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcgl8 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

23. A method of detecting a condition caused or facilitated by an aberration in mcgl8, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

24. A method for detecting MCG18 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG18 or 
its derivatives or homologues for a time and under conditions sufficient for an antibody-MCG18 
complex to form, and then detecting said complex. 
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TCAGTAAACA CAGAGACTGG GGATCGATC ATG GGG CTT TGT AAG TGC CCC AAG 53 

Met Gly Leu Cvs Lys Cys Pro Lys 

J| 5 

AGA AAG GTG ACC AAC CTG TTC TGC TTC GAA CAT CGG GTC AAC GTC TGC 101 
Arg Lys Val Thr Asn Leu Phe Cys Phe Glu His Arg Val Asn Val Cys 
10 15 20 

GAG CAC TGC CTG GTA GCC AAT CAC GCC AAG TGC ATC GTC CAG TCC TAC 14 9 

Glu His Cys Leu Val Ala Asn His Ala Lys Cys He Val Gin Ser Tyr 
25 30 35 40 

CTG CAA TGG CTC CAA GAT AGC GAC TAC AAC CCC AAT TGC CGC CTG TGC 197 
Leu Gin Trp Leu Gin Asp Ser Asp Tyr Asn Pro Asn Cys Arg Leu Cys 
45 50 55 

AAC ATA CCC CTG GCC AGC CGA GAG ACG ACC CGC CTT GTC TGC TAT GAT 24 5 

Asn He Pro Leu Ala Ser Arg Glu Thr Thr Arg Leu Val Cys Tyr Asp 
60 65 70 

CTC TTT CAC TGG GCC TGC CTC AAT GAA CGT GCT GCC CAG CTA CCC CGA 293 
Leu Phe His Trp Ala Cys Leu Asn Glu Arg Ala Ala Gin Leu Pro Arg 
75 80 85 

AAC ACG GCA CCT GCC GGC TAT CAG TGC CCC AGC TGC AAT GGC CCC ATC 341 
Asn Thr Ala Pro Ala Gly Tyr Gin Cys Pro Ser Cys Asn Gly Pro He 
90 95 100 

TTC CCC CCA ACC AAC CTG GCT GGC CCC GTG GCC TCC GCA CTG AGA GAG 389 
Phe Pro Pro Thr Asn Leu Ala Gly Pro Val Ala Ser Ala Leu Arg Glu 
105 110 H5 120 

AAG CTG GCC ACA GTC AAC TGG GCC CGG GCA GGA CTG GGC CTC CCT CTG 437 
Lys Leu Ala Thr Val Asn Trp Ala Arg Ala Gly Leu Gly Leu Pro Leu 
125 130 135 

ATC GAT GAG GTG GTG AGC CCA GAG CCC GAG CCC CTC AAC ACG TCT GAC 48 5 

He Asp Glu Val Val Ser Pro Glu Pro Glu Pro Leu Asn Thr Ser Asp 
140 145 150 

TTC TCT GAC TGG TCT AGT TTT AAT GCC AGC AGT ACC CCT GGA CCA GAG 53 3 

Phe Ser Asp Trp Ser Ser Phe Asn Ala Ser Ser Thr Pro Gly Pro Glu 
155 160 165 

GAG GTA GAC AGC GCC TCT GCT GCC CCA GCC TTC TAC AGC CGA GCC CCC 581 
Glu Val Asp Ser Ala Ser Ala Ala Pro Ala Phe Tyr Ser Arg Ala Pro 
170 175 180 

CGG CCC CCA GCT TCC CCA GGC CGG CCC GAG CAG CAC ACA GTG ATC CAC 629 
Arg Pro Pro Ala Ser Pro Gly Arg Pro Glu Gin His Thr Val lie His 
185 190 195 200 

ATG GGC AAT CCT GAG CCC TTG ACT CAC GCC CCT AGG AAG GTG TAT GAT 677 
Met Gly Asn Pro Glu Pro Leu Thr His Ala Pro Arg Lys Val Tyr Asp 
205 210 215 
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GAT GAT CGG ACA CCA GGC CTC CAT GGA 3AC 3AG GAT ' **^:> 

Tr.r Arg Asp Asp Asc Arg Thr Pro Gly Leu His Giy Asp Cvs Asp Asp 
220 225 23: 



GAC AAG TAC CGA CGT CGG CCG GCC TTG GGT TGG CTG GCC CGG CTG CTA 773 
Asp Lys Tyr Arg Arg Arg Pro Ala Leu Gly Trp Leu Ala Arg Leu Leu 
235 240 245 

AGG AGC CGG GCT GGG TCT CGG AAG CGG CCG CTG ACC CTG CTC CAG CGG 821 
Arg Ser Arg Ala Gly Ser Arg Lys Arg Pro Leu Thr Leu Leu Gin Arg 
250 255 260 

GCG GGG CTG CTG CTA CTC TTG GGA CTG CTG GGC TTC CTG r -CC CTC CTT 869 
Ala Gly Leu Leu Leu Leu Leu Gly Leu Leu Gly Phe Leu Ala Leu Leu 
265 270 275 280 

GCC CTC ATG TCT CGC CTA GGC CGG GCC GCA GCT GAC AGC GAT CCC AAC 917 
Ala Leu Met Ser Arg Leu Gly Arg Ala Ala Ala Asp Ser Asp Pro Asn 
285 290 295 

CTG GAC CCA CTC ATG AAC CCT CAC ATC CGC GTG GGC CCC TCC TGA 962 
Leu Asp Pro Leu Met Asn Pro His lie Arg Val Gly Pro Ser 
300 305 310 

GCCCCCTTGC TTGTGGCTAG GCCAGCCTAG GATGTGGGTT CTGTGGAGGA GAGGCGGGGT 1022 

AATGGGGAGG CTGAGGGCAC CTCTTCACTG CCCCTCTCCC TCAAGCCTAA GACACTAAGA 1082 

CCCCAGACCC AAAGCCAAGT CCACCAGAGT GGCTCGCAGG CCAGGCCTGG AGTCCCCGTG 1142 

GGTCAAGCAT TTGTCTTGAC TTGCTTTCTC CCGGGTCTCC AGCCTCCGAC CCCTCGCCCC 1202 

ATGAAGGAGC TGGCAGGTGG AAATAAACAA CAACTTTATT 1242 
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Figure 2 

gb|AA155210!AA155210 mr98e01.rl Stracagene mouse embryonic carcinoma 
I #93731^1 Mus musculus cDNA clone 605496 5" 

Ouerv- ! MGLCKC PKWCVTNLFCFEHRVNVCEHCLVAflHAKC IVQSYI£WLQDSDYNPNCRLCNI PL 6-3 

Sb)Ct: 

Figure 3 



■ MGLCKC PKWCVTNLFCFEHRVNVCEHCLVAflHAKC rVQSYLQWLQDSDY>fPNCRLCNI PL 6-3 
* MGLC KC PKRKVTNLFCFQ*RVNVCE>ICL V ANHAKC IVQSYLQWLQDS DYN FNCRLCN PL 
99 MGLC KC PKRXV'TNLFCrEHRVNVCEWCL V ANHAKC IVQSYLQWL^DS DYN PNCRLO/T PL :? : 



dbj|D759l3|CEUUllG3F Celegans cCNA clone yklllg3 : 5 ' end. single read. 

7 PKRKVTNLFCFE>(RVNVC E>ICLVANHAKC IVQSYt>QWU3DSDYNPNCRLCNI PLASRETT 66 
PKRKVTNLF .EHRVNVCE LV NH C*VQSYL WL 0 D^PNC IT L 



Sbjct: 



Ouerv- 67 RLVCYDLFHWACLNERAAQLPRNTAPAGYQCP 98 98 PSCNGPIFPPNQ 109 

RL C L HW C P TAP GY*CP P O ♦FPP'C 

sbjct . 13 1 RlJ^CI>u\XHV^CFDEVOCGNFPnTTAPXGYHCP 276 275 PCCSQEVFPPDQ 310 
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Figure 5 

sp|P46580|YLB5_CAEEL HYPOTHETICAL 146.8 KD PROTEIN C34E10.5 IN 

CHROMOSOME III gi| 500728 (U10402) C34E10.5 gene product 
[Caenorhabditis elegans) 

Query: 56 CNI PLA5REITRLVCYDLFHWACLNBlAAQLPRNrAPACYQCPSC 100 

C*I L ♦ L C LF W C+ E A *CP C 

Sbjct: 1222 CSICLQ^KNPSAI^CGHLFCWTCIQEHAVAATSSASTSSARCPQC 1266 



Figure 6 

gi|703468 (L29051) homologous to GATA- binding transcription factor 
(Schizosaccharomyces pombe) 

Query: 35 C rVQSYLQWLQDSDYNPNCRLCNI 58 

C ♦ *W ♦D NP C C ♦ 
Sbjct: 175 OVTIOTPKWRRDESGNPICNACGL 198 

Query: 162 SSTFGPEEVDSASAAPAFYSQAPRPPASPQu^EQKIVIHMGOT 221 

♦S PEE S S S P* SP ♦ +Q +1 P «V ♦ D 

Sbjct: 441 ASIXm>EEPPSNSDKQPSMSNGPKSEVSPSQSQQAPLI^ 500 

Query: 222 RTPGLH 227 

R L* 
Sbjct: 501 RNYALN 506 
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Figure 7 



t 



12 3 4 5 



WO 98/53061 PCT/AU98/00380 

6/32 

gb|AA074703|AA074703 z»76g07.rl Stracagene neuroepithelial ,,937231. 
Homo sapxens cCNA clone 531612 5' 
Length = 417 



Score = 818 (226.0 bits). Expect = e.le-io'l. Sum P(S) = 6 le-103 
Identxtxes = 206/259 ,79%,. Positives = 206/259 ,79%,, Strand ! ! Plus / 



Plus Strand HSPs: 

Score = 818 (226.0 hifH * » _ * ,^ afC1 

, 5um P(5) = 6. le-103 

Plus 

Query: 446 OGCCTCXXTTCT^ 

sbict- ^iLUJiilJilLL 11 "iiiiiiii ii iiniiiiiiiMi^ifTiii 

Query: 566 ( f^ D ?r?f? a ^ 625 

Query: 626 ?Tr?ff?t??tftf^ 

Sbjct: 



68S 



Query: 686 AAGGTGTATGATACGCGGG 704 

^. H II UNI II I II 

Sbjct: 289 AAACTATATGACACACCGC 307 

Score = 230 (63.6 bits,. Expect = 6. le-103. Sum P(5, = 6 le-103 
Identxtxes = 50/55 (90%). Positives = 50/55 (90%,. strand Plus / Plus 

sbict- 2 JJJLUcUiLU 11111 i * t n 1 1 1 1 1 1 1 1 Miiim miiT TTH Ti 

Score =175 (48.4 bits,. Expect = 6. le-103. Sum P(5, = 6. le-103 
Identxtxes = 39/44 (88%,. Positives = 39/44 (88%,, Strand = pL / Plus 

Query: 767 ap^ST^ 810 

Query: 731 'p^****CTGTCAOGATGACAAGTACOGAOGTCOOOC 765 
Sbict- 336 .UlilillU ' 11111111 ""I 'I HHI 

Score = 133 (36.8 bits,. Expect = 6. le-103. Sum P(5, = S. le-103 
Identxtxes = 29/32 (90%,. Positives = 29/32 (90%,. strand = Plus / Plus 

Query: 701 COGGATC»TC»CCGGACACCAGGCCTCXATCG 732 

sbict- JiUiil 111111111 " "I" I HHI 

Sbjct: 305 COaSATCATGftCCOGACAGCACGCATrCATCG 336 
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gb|AA134788|AA134788 zm81g02.rl Stratagene neuroepithelium (#937231) 
Hono sapiens cDNA clone 532082 5' 
Length = 368 

I 

Plus Strand HSPs: 

Score = 563 (155.6 bits). Expect = 3.8e-87, Sum P(3) = 3.8e-87 

Identities = 147/190 (77%) , Positives = 147/190 (77%), Strand ■ Plus / Plus 



Query: 


498 


Sbjct: 


103 


Query: 


558 


Sbjct: 


163 


Query: 


618 


Sbjct: 


223 


Query: 


678 


Sbjct: 


283 


Score 


= 454 



i it minimi inn n iiiiimn n in in i mini 



n mi i i mi n n inn ilium inn n n i mi 

GftGCCJVGCflCTX ^ ^ 222 
CAGGCCGGCCCGAGCAGCAOVCAGTGATCCACATOOGCAATOT 677 

ii mi iiiiiiiiiiiiiiiii ii iiniiii i i mi n it i mi 

CAAGCCGTCCCGAGCAGCAC^^ 282 
CCCCTAOGAA 687 

mi inn 

CCCCAAGGAA 292 

(125.4 bits). Expect = 3.8e-87, Sum P(3) = 3.8e-87 
Identities = 94/98 (95%), Positives = 94/98 (95%). Strand = Plus / Plus 

Query 398 GCACTGAGAGAGAAGCTGGCCACAGTCA 457 

iiniiiiiii inn 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 JJJJLJLJLJJL±± 

Sbjct: 2 QCACreraUSACAACCT 61 

Query: 458 ATCXZA!TCAOGTOGTGW3CCCAGAGCCC^ 495 

1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 

Sbjct: 62 ATCX^TGAOGTlt^TAAGOCCAGAGCOCCSAOOCCCrCAA 99 

Score = 219 (60.5 bits), &cpect = 3.8e-87, Sura P(3) = 3.8e-87 
I d entities = 51/60 (85%), Positives = 51/60 (85%), Strand = Plus / Plus 

Query* 702 OGGATGATGACCGGACACCAGGCCTCXATGGAGW 761 

n imimnni inn i mmimiiii mum inn n i 

Sbjct: 309 OMTGATGACCGGACAGCAGOCATrW 368 



Figure 9 

W32939 human TAOX^XXTllGGAAC^^ 
AA2421S9 mouse CTTCCGCGCTTTTCATO 
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M3LOCCFKRK VTOLFCFEHR VNVCEHCLVA NHAKCIVQSY LQWLQDSDYN PNCRLOHPL 60 
3 ASRETTRLVC YDLFHWACLN ERAAQLPRNj' APAGYQCPSC NGPIFPPTNL AGPVRSALRE 120 
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Search Analysis for Sequence: MCG4 
Search from 1 to 310 
Date: September 22.1997 



Matrix: panv250 natrix 
Score Region from 1 to 310 
Maxinuim possible score: 1S98 



Aligned sequences: 

1. = EST AA074703 phase 1 translation 

2. = EST AA134788 phase 3 translation 

3. = EST JUU34788 phase 2 translation 

4. = EST AA074703 phase 3 translation 

5. = EST AA074703 phase 2 translation 

6. = EST AA134788 phase 1 translation 
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Domains of MCG4 



zinc finger 




leucine 
zipper basic 



234 241 269 288 

zinc finger consensus: CX 2 HX 4 CX 2 CX 4 HX 2 CX l7 CX 2 CX lg HX 2 CX |g CX 2 C 
acidic domain consensus: 9/34 negatively charged amino acids, 0/34 positively charged 
basic domain consensus: 13/55 positively charged amino acids, 0/55 negatively charged 
leucine zipper domain consensus: LX^L^RX^X^ 

alternate "novel" leucine zipper-like motif where leucine would not be aligned along 
the one surface of an alpha helix domain: (aa 261) LX LXLX LXLX L (aa 286) 

6 6 6 
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FIGURE 12 



Sequences producing High-scoring Segment Pairs: 



gnl|PID|e236178 

gi |1293099 

gij 1655941 

pir| |S30356 

sp | P43069 |CC25_CANAL 

sp | P28818 |GNRP_RAT 

prf | | 1814463A 

pirj |B46199 

gnl|PID|e238680 

pirj |S22693 

sp | P1477 1 1 9C2 5 _ YEAST 

sp | P2 66744 STE6_SCHPO 

pir| |S28407 

sp | P27671 1 <3JRP_MOUSE 

gi | 386047 

sp |Q023 42 | CC2 5.SACKL 
pir| |S14177 
gi|433720 
gnl|PID|e241744 
gi|3484 

sp | P04821 1 CC25JTEAST 

gi | 915328 

pirj (A46199 

pdbj 1PTR| 

gi | 915330 

gi j 474982 

gi (1763306 

gi | 806957 

sp | Q033 85 | GNDS_HDUSE 
pir| (BVBYT.1 
gi | 452242 

sp| P07866 1 LTE1_YEAST 
gi 1 509050 
gi | 520587 

sp j P0513O | KPCl_DROME 
pir| |S35704 
sp|Q05655|KPCD_KUMAN 
pir| |S40279 
sp|P09215|KFCD_RAT 
gi | 520878 
gij 1519719 



.1, 



(270752) F25B3.3 {Caenorhabditis ele.. 

(U53884) aimless RasGEF [Dictyosteli . . 

(U67326) Ras-GRF2 (Mus musculus) 
CDC25 protein homo log - yeast (Candi.. 
CELL DIVISION CONTROL PROTEIN 25 
GUANINE NUCLEOTIDE RELEASING PROTEIN. . 
guanine nucleotide- releasing factor . . 
nucleotide-exchange- factor homo log c. 

(X97560) hypothetical protein L1309 .. 
CDC25 protein homolog - mouse /gi|50.. 
SCD25 PROTEIN /gi|457494 (H26647) SD. . 
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CDC25 protein homolog - mouse 
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SCD25 protein - yeast (Saccharomyces.. 
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Molecule: Protein Kinase C Delta Ty. . 
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FIGURE 13(a) 0) 
MCG7 - Cloning of a novel human gene that encodes a guanine exchange factor 

CGATTTCATTCCTCGCTCCCCACAGGTCCCTCTCCCCAA^TATTCCCATCTTGTCCTAG 6 0 
I S F L A P H R S L S PKYSHLVL 19 
CCCATCCCCC^GACTATCTCAAGGACCAGCTGTCCCC^CGCCCCCGACCTCCACTAGGCC 120 
A H P PDYLKDQLS PRPRP P LG 39 
TGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGGGCCGGGTTAGCCCCATGGGAA 180 
LCHPLPAGRRPVPGRVS PMG 59 
CGcagcgcctgtgtggccgcgggactcaaggctggcccggctcaagtgaacagcacgtcc 240 
TQRLC6RGTQGWPGS5BQHV 79 
aggaggcgacctcgtccgcgggtttgcat tctggggtggacgagctggGGGTTCGGTCCG 300 
QEATSSAGLHSGVDELGVRS >T 
AGCCCGGTGGGAGGCTCCCGGAGCGCAGCCTGGGCCCAGCCCACCCCGCGCCGGCGGCCA' r 3 60 
E PGGRLPERS LGPAHPAPAA 119 
TGGCAGGCACCCTGGACCTGGACAAGGGCTGCACGGTGGAGGAGCTGCTCC 420 
M AGTLDLDKG.CTVEELLRGC 139 
TCGAAGCCTTCGATGACTCCGGGAAGGTGCCX3GACCCGCAGCTGGTGCGCATGTTCCTCA 4 80 
I EAFD DSGKVRD PQLVRMFL 159 
TGATGCACCCCTGGTACATCCCCTCCTCTCAGCTGGCGGCCAAGCTGCTCCACATCTACC 540 
MMHPWYI PS SQLAAKLIjHIY 179 
AACAATCCCGGAAGGACAACTCCAATTCCCTGCAGGTGAAAACGTGCCACCTGGTCAGGT 600 
QQSRKDNSNSLQVKTCHLVR 199 
ACTGGATCTCCGCCrrCCCAGCGGAGTTTGACTTGA^ 660 
YWI SAFPAE FDLNPELAEQI 219 
AGGAGCTGAAGGCTCTGCTAGACCAAGAAGGGAACCGACGGCACAGCAGCCTAATCGACA 720 
KELKALLDQEGNRRHSS LID 239 
TAGACAGCGTCCCTACCTACAAGTGGAAGCGGCAGGTGACTCAGCGGAACCCTGTGGGAC 780 
IDSV PTYKWKRQVTQRNPVG 259 

QKKRKMSLL FDHLEPMELAE 279 

ATCTCACCTACITGGAGTATCGCTCCTTCTGCAAGATCCTGTTTCAGGACTAT 900 

HLTYLEYRS FCKILFQDYHS 299 

TCGTGACTC^TGGCTGCACTGTGGACAACCCCGTCCTGGAGCGGTTCATCTCC 960 

FVTHGCTVDNPVLERFISLF 319 

ACAGCGTCTCACAGTGGGTGCAGCTCATGATCCTCAGCAAACCCAC^GCCCCGCAGCGGG 1020 

NSV SQWVQLMILSKPTAPQR 339 

CCCTGGTCATCACACA CTTTG TCCACGTGGCGGAGAAGCTGCTACAGCTC 1080 

A L V I THFVHVAEKLLQLQNF 359 

ACACGCTGATGGCAGTGGTCGGGGGCCTGAGCCACAGCrCCATCTCCCGCCTC^ 1140 

NTLMAVVGGLSHSSISRLKE 379 
CCCACAGCCACGTTAGCCCTGAGACCATCAAGCTCTGGGAGGGTCTCACGGAACTAGTGA 1200 

THS HVS PET I KLWEGLTELV 399 

CGGCGACAGGCAACTATGGCAACTACCGGCGTCGGCTGGCAGCCTGT G TGGGCT 1260 

TATGNYGNYRRRLAACVGFR 419 

TCCCGATCCTGGGTGTGCACCTCAAGGACCTGGTGGCCCTGCAGCTGGCACTGCC^ 1320 

FP I LGVHLKDLVALQLALPD 439 
GGCTGGACCCAGCCCGGACCCGGCTCAACGGGGCCAAGATGAAGCAGCTCTTTAGCATCC 1380 

WLDPARTRLHGAKMKQLFSI 459 

TGGAGGAGCTGGCCATGGTGACCAGCCTGCGGCCACCAGTACAGGCCAAC 1440 

LEE LAMVTS LRP PVQANPDL 479 

LS LLTVSLDQYQTEDELYQL 499 
CCCTGCAGCGGGAGCCGCGCTCCAAGTCCTCGCCAACCAGCCCCACGAGTTGCACCCCAC 1560 
SLQREPRSKSSPTSPTSCT? 519 
CACCCCGGCCCCCGGTACTGGAGGAGTGGACCTCGGCTGCCAAACCCAAGCTGG^ 1620 
P PR P PVLEEWTSAAKPKIiDQ 539 
C C CTCGTGG TGG AGC ACATCGAG AAGATGGTGG AGTCTGTGTTC CGG AACTTTGACGTCG 1680 
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FIGURE 13(a) (u) 



ALVVBHIEKMVESVFRNFDV SS9 
ATGGGGATGGCCACATCTCACAGGAAGAATTCCAGATCATCCGTGGGAACTTCCCTTACC 174 0 
DGP G HISQEEFQIIRG KPPY 579 
TCAOCGCCTTrGGGGACCTCGACCAGAACCAGGATGOrrG^ 1800 

LSAFGDLDQNQDGCISREBM 599 

TTTCCTATTTCCTCCGCTCCAGCTCTGTGTTGGGGGGGCGCATGGGCTTCGTACACAACT 1860 

V SYFLRSSSVLGGRMGFVHN 619 

TCCAGGAGAGCAACTCCTTGCGCCCCGTCGCCTGCCGCCACTGCAAAGCCCTGATCCTGG 1920 

PQESNSLRPVACRHCK ALIL 639 

GCATCTACAAGCAGGGCCTCAAATGCCGAGCCTGTGGAGTGAACTGCCACAAGCAGTGCA 1980 

GIYKQGLKCRACGVNCHKQC 659 

AGGATCGCCTGTCAGTTGAGTGTCGGCGCAGGGCCCAGAGTGTGAGCCTGGAGGGGTCTG 2040 

KDRLSVECRRRAQS-VSLEGS 679 

CACCCTCACCCTCACCCATGCACAGCCACCATCACajCGCCTTCAGCTTCTCTCTGCCCC 2100 

APSPSPMHSHHHRAFSFSLP 699 

GCCCTGGCAGGCGAGGCTCCAGGCCTCCAGAGATCCGTGAGGAGGAGGTACAGACGGTGG 2160 

RPGRRGSRPPBIREEBV QTV 719 

AGGATGGGGTGTTTGACATCCACTTGTAATAGATGCTGTGGTTGGATCAAGGACTCATTC 2220 
B D G V F D I H L * "v 728 

CTGCCTTGGAGAAAArACTTGAACCAGAGCAGGGAGC 2280 
TGGGGATGGGGGTGGGATATGAGGGTGGCATGCAGCTGAGGGCAGGGCCAGGGCTGGTGT 2340 
CCCTAAGGTTGTACAGACTCTTGTGAATATTTGTATTTTCCAGATGGAATAAAAAGGCCC 2400 
GTGTAATTAACCTTC ( A) n 
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FIGURE 13(b) 

i 



CGATTTCATTCCTCGCTCCCCACAGGTCCCTCTCCCCAAAATATTCCCATCTTGTCCT^ 60 
CCCATCCCC(^GACTATCTCAAGGACCAGCTGTCCCCACGCCCCCGACCrrCC^CTAGGCC 120 
TGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGGGCCGGGTTAGCCCCATGGGAA £80 x 

* p h g n 

CGGGGTTCGGTCCGAGCCCGGTGGGAGGCTCCCGGAGCGCAGCCTGGGCCCAGCCCACCC-^C 
gv rs epgg r Ipe rs I gpahp - 

CGCGCCGGCGGC CATGG CAGGCACCCTGGACCTGGACAAGGGCTGCACGGTGGAGGAGCT 
a p a aMAGT'L D • L D KG C T V E B L 
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FIGURE 14 

1 MAGTLDLDKGC. . . TVEELLRGCIEAF . . DDSGKVRDPQLVRMFLMMHPW 45 

|.:.:: |. | ::|: I - I - I I s:| ::: |.| 

1 MSSKVEEDQHQELLTEDQLVARCVZCFDynEEDEffiDI£Z^AL^LSHQW 50 

46 YIPSSQLAAKLLHIYQQSRKDNSNSLQVKTCHLVRYWISAFPAEFDLNPE 95 

. .| | ..:s::||:.|. : -h |.:||- II -II -h n 

51 LSDSLSLITHFVNFYQETRNVEQRE. . . AVCRAVSFWIEKFPMHFDAQPQ 97 

96 LAEQIKELKALLDQEGNRRHSSLIDIDSVPTYKWKRQVTQRNPVGQKK. . 143 

:..|: ||.: :: :.:| |:..:|.: | |.|. |||::-.. 

98 VCAQWRLKTIAEDINENIRNGL.DVSALPSFAWLRAVSVRKPLAKQTIV 146 

144 RKMSLLFDHLEPMELAEHLTYLEYR 168 

- :|| | s .| s:.. |..::|| 

147 RVDFETLPTPGTPPPFPIASKKFSLTAFSLSFVQASPSDISTSLSHIDYR 196 

169 SFCKILFQDYHSFVTHGCTVDNPVLERFISLFNSVSQWVQLMILSKPTAP 218 
:::| : :.. :|..| . hill ||:||.:|.||| llhl-h- 

197 vlsrisitelkqyvkix;hlrscpmle^sisvfnnlsi^cx:milnkttpk 246 

219 QW&r.VTT HFVHVAEKLLQLQNFNTLMANA^ <yT.gHgSTSRLKETHSHVSPE 268 

:|| - = • I I I I I I • I II I s • I I I : • I I : l : ^ 

247 gP^^TT.\7 VP\7W^AKHT J RKTNNF NTT.M57WGGTTH^5^ARIiAKTYAVl'SND 296 

269 TIKLWEGLTELVTATGNYGNYRRRLAAC . VGFRFPILGVHLKDLVALQLA 317 

. | :..||:|:.| hll I 1 = I I = I I I I I I I I I : ■ • _ 

297 I KKELTQLTNLLSAQHNFCEYRKALGACNKKFRI PI IGVHLKDLVAINCS 34G 

318 LPDWLDPARTRLNGAKMKQLFSILEELAMVTSLRPPV . QANPDLLSLLTV 366 

::: .. . : . : . | : . | . : | . : : : • • - I I I = - 1-1 

347 GANFEKT. . KCISSDKLVKLSKLLSNFLVFNQKGHNLPEMNMDLINTLKV 394 

367 SLDQYQTEDELYQLSLQREPRSKSSPTSPTSCTPPPRPPVLEEWTSAAKP 416 

HI .:|::|:l||.|||:- • • I • I • I = • I I -I : ■ • • 

395 SLDIRYNDDDIYELSLRREPKTFMN FEPSRGLVFAEWASGVTV 437 

417 KLUQAL WEH I EKMVE SV FRNFQYDSEGilLSCEJiEO; 1 1 RGNF r YLS AFGD 466 

|.| | .||- lh-lh-1 I II IIIIHM 1 1 1 1 = = - 1 1 - = Aon 

438 APDNATVSKHISAMVDAVFKHYDHEBDQEXSQEEEQI'IAGNFPFIDAFVN 487 

467 T. nnNonac-T sreem ^vftrss . gvr/3GRMGFVn Mpl< ^p- SMSLB PVACRHC 515 

-I : i| ||::|: .||: : I -II II hi - I 

488 i mBUKSai -^KDELK TYFmANKNTKDLRRGFKHNFHRTTFLTPTTCNHC 537 

si 6 KALTLGIVVnr.T.KrRACG V^r Hy " rK " RLSVECRRRAQSVSLEGSAPSP f 565 

. |::||.:|hll -lh-1- II - • I I I I 8 • ' "' : ' on 

SIR MKT.r.WGTI.PnnFKPKTX'GI.A VHCirrKSNAVAECRRKSSSNLTRAAEWFAS 587 

566 PMHSHHHRAFSFSLPRPGRRGSRPPEIREEEVQTVEDGVFDIHL 609 

I. I : | : -'II 1*1 : ..|:.. | -I 

588 PRGSMRSRI INTC NNSGSTPDEEIGLVSLACEEVFEDDDL 627 
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FIGURE 15 



human 
human 
human 
human 



CGATTTCATT 
CCCATCCCCC 
TGTQCCACCC 
CGCAGCGCCT 

AGGAGGCGAC 



CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 



AGACTATCTC AAOGACCAOC ^TGTCCCCACG CCCCCGACCT CCACTAOGCC 120 
OCTGCCTGCA GGAAGACGCC COGTCCCOOG CCOGGTTAGC CCCATOGGAA 180 
GTGTGOCCGC GGGACTCAAG GCTOGCCTOG CTCAAGTGAA CAGCACGTCC 240 
***tcag # * ****ag* # * # c #»«#*#*«* *** a *g***t> 
GGTTTQCATT CTOOOOTOGA CGAGCTOGGG GTICOGTCOG 300 

acagg 



human 
mouse 
human 
mouse 
human 
mouse 
human 
mouse 
human 
mouse 
human 
mouse 
human 
mouse 



g*#«**t**a # *-*catt** •******••• •**aa**aa* g* # ct***** **a**aat**> 

AOCCCOGTGG GAOGCTCCCG GAGCOCAOCC TGOGCCCAOC CC^CCCOGCG CCOGOOGCCA, 360 
• ** a *f»» .♦•.♦♦• tga •** t *t*a^t •♦**t # f ***-*tg**a •*•*♦*****> 
XQGCAGGCAC CCTOGACCTG GACAAGGGCT GCACGGTGGA OGAGCTGCTC CGCGGGTGCA 420 

••**ga" ## t*»*** ## ** ♦♦•••*»*t* ***•<:• ** t ** c ** t *> 

TCGAAGCCTT CGATGACTCC GOGAAOGTGC GGGACCCGCA GCTQGTGCGC ATOTTCCTCA 480 



TGATGCACCC CTGGTACATC CCCTCCTCTC AGCTOGCQGC CAAGCTGCTC CACATCTACC 540 

»•»•**•*#• *«******* a •»£•••*••* g*» a ****»* ***£*•»*£*> 

AACAATCCCG GAAGGACAAC TCCAATTCCC TGCAGGTGAA AACGTOCCAC CTGGTCAGGT 600 

•g*****«*« ••••**••£* • a *** a ***» »••*#*£*•• £**•*••»**> 

ACTGGATCTC CGCCTTCCCA GCGGAGTTTG ACTTGAACCC GGAGTTGGCT GAGCAGATCA 660 
•**•**•••* a «»«*»*«»* • •^•••* c « »••*•••••« a «*# c ##**« •* a *****«* > 

AGGAGCTGAA GGCTCTGCTA GACCAAGAAG GGAACCGACG GCACAGCAGC CTAATCGACA 720 
*••*•••»•• ******* t • ca •••••• • 

TAGACAGCGT 730 
*c**g**t* * 
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FIGURE 16 



CACGCCTCGGAAGGGAGGTTTGGGGTCGGTGGTTTC^CACTGAGTGTGTCTGAMCCAAA 60 
TGGTCGGAAACTCTTACCCGCTCTCCTAGGCCCGGCTAGTGGGGAC^ 120 

GCTGCCCCTCCCAAGTTCCTCCCTGTTGGCCAGGCATCCAGGTCTCCAGTCTCCGAGCro 180 
rt-PSOVPPC WPGIQVSSLRA> 
CGGAGAACCCACCGCCACATGCGGCTGCCCCTTTCCATTCGACCCTGTGGGGAGCCAGGC 2 4 0 



ft ENPPPHA AAPFHSTLWGAR> 
^CCGGGGCCCCGTTCCrCCTGTGTGAACTGGGCCCCCCGCCCCCATTCCCAC^CATCAA 300 
PRSSCVNWAPRPHSQTS> 



GGCCGCGTCTCCAGATAGCCACGATTTCATTCCTCGCTC^ 360 
PPRL0IATISFLAPHRSLSP> 

Ltattcccatc^gtcctagcccatcc-^^ 420 

vv«!HLVLAHPPDYLKDQLSP> 
GCCCCCGACCTCCACTAGGCCTGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGG 4E0 



LcGGGTTAGCCCCATGGGAACGcagcgcctgtgtggccgcgggactcaaggccggcccg 



540 



RPR PPLGLCHPLPAGRRPVP> 
: CCC ATGGGAA 

* p h Q n _ 
GRVS PMGTQRLCGRGTQGWP> 

gcccaagtgaacagcacgtccaggaggcgacctcgtccgcgggtttgcattctggggtw 600 
acgagctggG^CG^TCCGAGCCCGGTGG^ «" 

CC^CCCCGCGCCGGCGGC^ "0 
AHPAPAAMAGTLDLDKGCT V> 
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FIGURE 17 
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FIGURE 18 (Coot. I 



Smal/Apal (both lost) 0.00 




k pal/Smal (both lost) 1.00 



Plasmid name: clone 16 in pGEX-3X 
Plasmld size: 6.00 kb 
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FIGURE 18 (Cont. II) 

i 

EcoRl 0.00 




Plasmid name: clone 19 in pGEX-1 
Plasmid size: 6.00 kb 




Plasmld name: clone 5 in pGEM-11z! 
Plasmid size: 5.50 kb 
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FIGURE 18 < Cont - IV > 
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Plasmld name: clone 27 in pGEX-2T 
Plasmid size: 7.50 kb 
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FIGURE 19 



GCCCGCCGCC ATG CCG CCC TTA CTG CCC CTG CGC CTG TGC CGG CTG TGG 49 
Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp 
1 5 10 

CCC CGC AAC CCT CCC TCC CGG CTC CTC GGA GCG GCC GCC GGG CAG CGG 97 
Pro Arg Asn Pro Pro Scr Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg 
IS 20 25 

TCC AGA CCC ACT ACT TAT TAT GAA CTG TTG GGG GTG CAT CCT GGT GCC 145 
Ser Arg Pro Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala 
30 " 35 40 45 

AGC ACT GAG GAA GTT AAA CGA GCT TTC TTC TCC AAG TCC AAA GAG CTG 193 
Ser Thr Glu Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu 
50 55 60 

CAC CCA GAC CGG GAC CCT GGG AAC CCA AGC CTG CAC AGC CGC TTT GTG 241 
His Pro Asp Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val 
65 70 7 5 

GAG CTG AGC GAG GCA TAC CGT GTG CTC AGC CCT GAG CAG AGC CGC CGC 
Glu Leu Ser Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg 
80 »5 90 

AGC TAT GAT GAC CAG CTC CGC TCA GGT ACT CCC CCA AAG TCT CCA CGA 
Ser Tyr Asp Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg 
95 100 105 

ACC ACA GTC CAT GAC AAG TCT GCC CAC CAA ACA CAC AGC TCC TGG ACA 
Thr Thr Val His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr 
H0 US I 20 1 

CCC CCC AAC GCA CAG TAC TGG TCC CAG TTT CAC AGC GTG AGG CCA CAG 433 
Pro Pro Asn Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin 
130 135 l 40 

GGG CCC CAG TTG AGG CAG CAG CAA CAC AAA CAA AAC AAA CAA GTG CTG 481 
Gly Pro Gin Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu 
145 150 155 

GGG TAC TGC CTC CTC CTC ATG CTG GCG GGC ATG GGC CTG CAC TAC ATT 529 
Glv Tyr Cys Leu Leu Leu Met Leu Ala Gly Met Gly Leu Hls Tyr He 
160 165 1™ 

GCC TTC AGG AAG GTG AAG CAG ATG CAC CTT AAC TTC ATG GAT GAA AAG 577 
Ala Phe Arg Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys 
175 180 18S 

GAT CGG ATC ATC ACA GCC TTC TAC AAC GAA GCC CGG GCA CGG GCC AGG 625 



289 



337 



385 
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FIGURE 19 (cont led) 

Asp Arg lie lie Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg 
190 195 200 205 

GCC AAC AGA GGC ATC CTT CAG CAG GAG CGA CAA CGG CTA GGG CAG CGG 673 
Ala T.in Arg Gly He Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg 
210 215 < 220 

CAG CCG CCA CCA TCC GAG CCA ACC CAA GGC CCC GAG ATC GTC CCC CGG 721 
Gin Pro Pro Pro Ser Glu Pro Thr Gin Gly Pro Glu lie Val Pro Arg 
225 230 235 

GGC GCC GGC CCC TGA GGGGCTC ACCTGGATGG GGCCTGCAGT GCGTTCCCGC 773 
Gly Ala Gly Pro ♦ 
240 

TTTGCTTCCT TCCCTGGACG GCCCGCTCCC CGAAACGCGC GCAATAAAGT GATTCGCAG 832 
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FIGURE 20 

>sp|PC8622|DHAJ ECOLI DNAJ PROTEIN >pir||HHECDJ heat shock protein dnaJ - 
Escherichia coli >gi|145769 (M12565) -heat shock protein dnaJ 
(Escherichia colij >gi|216441 (D10483) dnaJ protein (Escherichia 
coli) 

Length * 376 

Score = 138 (63.7 bits). Expect = 1.2e-10. P = 1.2e-10 
Identities = 25/62 (40%), Positives = 39/62 (62%) 

Query: 3 5 YYEUia/HPGASTEEVra*^ 94 

YYE^LGV A Eo«-A* ♦ ♦ HPER* G* ♦♦F £♦ EAY VL* Q R ♦ 
Sbjct: 6 YYEIU^SKTAEEREIRKAYKRIJWYH^^ 65 

Query: 95 YD 96 
YD 

Sbjct: 66 YD 67 
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FIGURE 21 

>gi| 1703590 (U80439) contains similarity to a DNAJ-like domain (Caenorhabditis 
el eg axis J 
Length = 345 

Score = 98 (45.2 bits). Expect = 5.2e-12. Sum P{3) = 5.2e-12 
Identities = 17/37 (45%). Positives = 28/37 (75%) 

Query: 28 QRSRPSTYYELLX^/HPGASTEEVTCRAFFSKSKHJiPD 64 

R TVYE+LCV A* E*K AF*«-*SK**HPD 
Sbjct; 22 KKIRQOTWrTVTjWESTATLSEIKSAFYAQSKKVHPD SB 

Score = 74 (34.1 bits). Expect = 5.2e-12, Sum P(3) = 5.2e-12 
Identities = 17/32 (53%), Positives = 19/32 (59%) 

Query: 71 SLHSRFVELSEAYRVLSRBQSRRSYDDQLRSG 102 

S ♦ F*EL AY VL R RR YD QLR C 
Sbjct: 64 SATASFLELKNAYDvTJWPADRRLYDYQLRGG 95 

Score = 39 (18.0 bits). Expect = 5.2e-12. Sum P(3) = 5.2e-12 
Identities = 10/42 (23%). Positives = 19/42 (45%) 

Query: 162 LI>lLAGM3IJrf IAFRKVXQMHLNFT© ITAFYNEARAR 203 

L«-**AG Y* Q I> ♦ ♦♦0 I F ♦ R 

Sbjct: 158 LVLVAGYNC3GYLYIXAYNQKQLDKL I DEDE IAKCFLRQKEFR 199 



>gnl|PID|e281266 (281030) C01G10.12 (Caenorhabditis elegans) 
Length = 191 

Score = 96 -(44.3 bits). Expect = 1.8e-09. Sura P(3) = 1.8e-09 
Identities = 17/41 (41%). Positives = 27/41 (65%) 

Query: 35 YYEUX3VHPGASTEEVKRAFFSKSKEUtPDR° 75 

A* AF K*K*LHPD* ♦ SR 

Sbjct: 19 YYEI IGVSASATRQEIRDAFLKKTKQtilPDQSRKSSKSDSR 59 

Score = 54 (24.9 bits). Expect = 1.8e-09, Sum P(3) = 1.8e-09 
Identities = 10/22 (45%). Positives = 15/22 (68%) 

Query: 75 RFVELSEAYKVLSREQSPPSYD 96 

♦F* ♦ EAY VL E> R* YD 
Sbjct: 71 QFMLVXEAYUVLH*!2QCRKEYD 92 

Score = 35 (16.1 bits). Expect = 1.8e-09, Sura P(3) = l.Be-09 
Identities = 9/44 (20%). Positives = 22/44 (50%) 

Query: 141 QGPQLRQQQHKQNKQVU^CIXLM^^ 184 

♦ P* * KQ **A *C ♦ ♦ RK** 

Sbjct: 145 RNPQ3EYLREXQKNRMLWLAATVMALIGAOT 188 
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FIGURE 22 

>S P |Q10209|YAY1 5CHPO HYPOTHETICAL 44.8 KD PROTEIN C4H3.01 IN CHROMOSOME I 
>giTll84014 (Z69380) unknown [ Sen izosac char omyces pombe) 
Length = 392 

Score = 84 (38.8 bits). Expect = 4 . le-08 Sum P(3) = 4. le-08 
Identities = 13/36 (36%). Positives = 25/36 (69%) 

Query: 35 YYELL^A^PGASTEEVraAFFSKSKELHPDRDPGNP 70 

YY+LLG* A* ♦♦K*A* ♦ * HPD*«-P *P 
Sbjct: 9 YYDLLG ISTDATAVD IKKAYRKIAVKYHPDKNPDDP 44 

Score = 64 (29.5 bits). Expect = 4. le-08. Sum P(3> = 4. le-08 
Identities = 14/40 (35%), Positives = 23/40 (57%) 

Query: 75 RFVn^EAYPCVT^REQSRRSYDDQLRSGSPPKSPPTTVHD 114 

♦ F ♦♦SEAY+VL E«» R YD ♦ ♦ P* T *D 
Sbjct: 50 KFQKI SEAYQVX/SDEKLRSQYDQFGKHCAVPEQGFTDAYD 89 

Score = 37 (17.1 bits). Expect = 4. le-08. Sura P(3) = 4. le-08 
Identities = 9/29 (31%). Positives = 15/29 (51%) 

Query: 190 DR I ITAFYNEARARARANRG ILQQERQRL 218 

DR A E A A* ♦ RQR* 
Sbjct: 149 DRKW1AQ I REREALAKREQEMI EDRRQR I 177 

Score = 33 (15.2 bits). Expect = 0.00081. Sum P(3) = 0.00081 
Identities = 8/19 (42%). Positives = 11/19 (57%) 

Query: 140 PQGPQLRQQQHKCNKQVXX3 158 

PQG ♦ Q*> * QVLG 
Sbjct: 44 PQGASEKFQKISEAYQVLG 62 



FIGURE 23 

>gnl|PID|e253406 (X77635) tumorous imaginal discs (Drosophila virilisj 

>gnl|PID|e263866 (Y07700) Tid58 protein (Drosophila virilisj 
Length = 529 

Score = 153 (70.6 bits). Expect = 9.7e-13, P = 9.7e-13 
Identities = 27/71 (38%). Positives = 44/71 (61%) 

Query: 26 AGQRSRPSTYYEXXCTVHFGASTEEVKR 85 

» r ♦ YY LOT A* ♦♦♦K«-A** HPD ♦ *F ♦♦SEAY V 

Sbjct: 72 SSSRMQAKDYYATlGVAKNANAroiKKA 131 

Query: 86 LSREQSRRSYD 96 

LS 4^ RR YD 
Sbjct: 132 LSDDQKRREYD 142 



WO 98/53061 

FIGURE 24 



28/32 



PCT/AU98/00380 



MCG18 - - -MPPLLPLRLCRLWP-RN- -PP SRLLGAA 

HDJ-2 MVKCTTYYDVLGVK PNATQEQJCKAYHKLALKYHPDKN- - PN EGEKFKQ I SQAYEV 

HDJ- 1 MGKD- - YYQTLCLARGASDEEIKBAYRRQALRYHPDKNKEPG AEEXFKEIAEAYDV 

HSJ1 •YYEILDVPRSASADDIKKAYfelKALQWHPDKN- - PDNKEFAEKKFKEVAEAYEV 



MCG18 AGQRSRPSTY- - YELLGVH PGA ST-EEVKRAFFS-- 

HDJ- 2 LSDAXKFELYDKOGEQAIK- - - BGGAGOG — FCSPMDIFT HFTO GC 

HDJ- 1 LSDPRKREIFDRYGEBGLKGSGP SGGSGGGMCTSFSYTFHGDPHAMF AEFFG- - 

HSJ1 LSDKHKREIYDRYGRBGLTGTGTGPSRAEACSGCP — G- -FTFT -FRSPEEVFREFFG— 

. 

MCG18 KSKELHPDRDPGNP SLHSRFVELSEAYKVLSREQSRRS- - YDDQLR9GSPPKSPRT 

HDJ- 2 GRMQREWlGKNVvHQLSvTLEDLYNGATRK^ 

HDJ-1 GRNPFT7TFTGQRM3EBC24DI DDPFSGFPMGJOSF^^ - RJSAQEPARKXQDPPVT 

HSJ1 SGDPFAELFDDLGP- -FSELQNRGSRHSGPFFTFSSSFPGHSDFSSSSFSFSPGAGAFRS 



MCG18 TVHDKSAHQTOSSWTPPNAQY WSQFHSVRPQ GP QLRQQQHXQN 

HDJ-2 TC^IRIHQIGPCMVQQIQSVCMECQOTGEEISP^^^ 

HDJ- 1 HDLRVSLEE IYSGCTKKMK ISH-KRLNP— D GKSIRNEDKILTIEVKK 

HSJ1 VSTSTTFVQGRR ITTRR IME NGQ-ERVEVEED GQ LKSVTINGVPD 



MCG18 KQVLCYCLLL MIJVCM^HYIAFRKVKQMHI2^F^^E-KDRIITAFY^£EARARAR^ 

HDJ- 2 GMKDGQKITFHGBGDQEPGLEPGDI I IVLDQXDHAVFTRRGEDLFMCMDIQLVEALCGPQ 

HDJ- 1 CWKnnTCITFTKBGDQTSNNIPADrVFVlOTKP^ 

HSJ 1 DLARGLELSR- RE- -QQP- SVTSRSGGTQVWTPASCPLD- SDLSEDEDLQLAMAYSLSE 



MCG18 RGILQQERQRLGQRQPP-PSEPTQGPEIVPRGAGP 

HDJ-2 KPISTUKRTrVITSHPG0IVKHC2)IKCVLNBGMP 

HDJ-1 VNVPTLDGRTIPWFK- -DVIRPGMRRKVPGBGLPLPCTPEKRGDLI IEFEVTFPER- - 1 

HSJ1 MEAAGKKPAGGREAQHR-RQGRPRPSTTIQAWOGP— RR- -VRG-- VKQPNAVHFQR-RR 

MCG1B " 

HDJ- 2 SPDKLSIXEKLLPERKEVEETOEMDQVELVDFDPTOER 

HDJ-1 PQTSRTVLEQVLPI 

HSJ1 PLAASSSEMRAQPD LIQILTGGSDSLWEEKRGVS 

HCG18 

HDJ-2 QTS 



HDJ-1 
HSJ1 



* = amino acid identity in all 4 proteins 
. = conservative substitution 
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FIGURE 25 



CAAGGAGCCTCTGCCTGCCCGTCGTCGTCATGCCGTCCCTGTTGCTCCAGCTGCCCCTGC 6 0 

MPSLLLQLPL 10 
GCCTATGCCGGCTGTGGCCGCATAGCCTTTCCATCCGACTTCTCACAGCCGCCACAGGGC 120 
RLCRLWPHSLS IRLLTAATG 30 
AGCGGTCTGTCCCTACTAATTACTATGAATTCTTGGGCGTGCATCCGGGTGCCAGCGCTG 180 
QRSVPTNYYEL LGVHPGASA 50 
AAGAGATTAAACGTGCTTTTTTCACCAAGTCAAAAGAGCTACACCCTGATCGAGACCCTG 240 
EE I KRAFFTK S KELHPD RDP 70 
GGAACCCAGCCCTGCATAGCCGCTTTGTGGAGCTG AATGAGGCATATCGAGTGCTCAGTC 300 
GN PA LHSRFV E LNEAYRVLS 90 
GTGAGGAAAGTCGTCGTAACTATGACCACCAGCTGCATTCAGCCAGTCCTCCAAAGTCTT 360 
REESRRNYDHQLHSASPPK S 110 
CAGGGAGCACAGCCGAGCCTAAGTATACGCAACAGACACACAGCAGCTCCTGGGAACCCC 420 
SGSTAEPKYTQQTHSSSWEP 130 
CCAACGCTCAATACTGGGCCCAGTTCCACAGTGTGAGGCCGCAGGGGCCGGAGTCAAGGA 480 
PNAQ.YWAQFHSVRPQGPESR ISO 
AGCAGCAGCGTAAACACAACCAGCGGGTCCTGGGGTACTGCCTCCTGCTCATGGTGGCAG S40 
RQQRKHNQRVLGYCLLLMVA 170 
GCATGGGCCTGCACTATGTTGCCTTCAGGAAGCTGGAGCAGGTGCATCGCAGCTTCATGG 600 
GMGLHYVAFRKLEQVHRSFM 190 
ATGAAAAGGACCGGATCATTACAGeCATCTACAATGACAGTCGGGCCAGGGCCAGGGCCA 660 
DEKDR1ITAIYWDTRARARA 210 
ACAGAGCCAGGATTCAGCACGAGCGCCAf?GAGAGGCAGCAGCCTCGGGCAGAACCCTCCC 720 
NRARIQQERHERQQPRAEPS 230 
TGCCTCCAGAAAGCTCCAGGATCATGCCCCAGGACACAAGCCCCTGAGAGGCTTAACTAA 780 

LPPESSRIMPQDTSP' 245 
ATGGGACCTTCATTGGTCCTCTCCCTGCTGCCTGTCCAGAACTACACGTGCAATAAACTC 840 

849 

ATTTTCAG (A) n 
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FIGURE 26 



human MDG1B MPPLL PUttXJUJHPfOTPSIUUZ^AAOQR^ 

mouse M0G18 mPSLIJjQIJIJuXRLWPHSI^IMX 

»• •••#«*•+• *♦* *••••••••••#,••.•••••. • 

human M0C18 STCEUlPDW>PCOTSUiSfU^^ 
mouse MOT18 SKEIJIPDRDPCOTAUISRFV^ 

human MDG18 HfflHSS-V/TPFNAC^SQFttSV^^ 
mouse MCG18 QffTHSSSWEPPNAQVWAQ^^ 

human MCC18 KVKQMHLNFKDEKDRI ITAFYNEARARARftNRGII^E^QIUXK3RQPPP^PTQGPE-- - 

mouse MTC18 K2-BQVHRSFMDEKDRI ITAIYNDTRARARANRARIQQER- - - HERQQPKAEPSLPPESSK 

• * • •*•**••*••• •••♦*••••• .*•** * .**. ** 

human MD618 IVPRGAGP 
MDQ18 IMPQCTSP 
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1 



FIGURE 27 



ttgaagtctagccccatcctggtccaatgcgctcttggtagcctcctxtcccagctgccc 60 
♦SLAPSWSNALLVASFPSCP 

gcccgccgccATGCCGCCCTTACTGCCCCTGCGCCTGTGCCGGCTGTGGCCCCGCAACCC 120 
PAAMPPLLPLRLCRLWPRNP> 
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